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DNA ENCODING A KINESIN-LIKE PROTEIN (HKLP) COMPRISING BIALLELIC MARKERS 

FIELD OF THE INVENTION 

The present invention is directed to polynucleotides encoding a human kinesin-like 
polypeptide as well as a regulatoiy region located at the 3'-end of said coding region. The invention 
5 also concerns polypeptides encoded by the kinesin-like gene. The invention also deals with 

antibodies directed specifically against such polypeptides that are useful as diagnostic reagents. The 
invention further encompasses biallelic markers of the HKLP gene useful in genetic analysis. 

BACKGROUND OF THE INVENTION 

The kinesins are mechanochemical proteins utilizing chemical energy from ATP hydrolysis 
10 to generate mechanical force. The kinesins can bind to and move on microtubules in the presence of 
ATP. The ability to move on microtubules has led to the classification of kinesins as microtubule 
motor proteins. The kinesins play important roles in intracellular transport and cell division. 

Several kinesin proteins are involved in vesicle/organelle transport in neurons, and mutants 
of kinesin in Drosphila show impaired neuronal function. In humans, defects in kinesin-encoding 
1 5 genes could cause neurological disorders or syndromes of clinical importance. 

The kinesin proteins cany out or facilitate movements of the chromosomes and spindle in 
meiosis and mitosis. Defective meiotic kinesins in humans may be the causes of infertility, 
spontaneous abortion, neonatal chromosome disorders, and aneuploidy. In mitotically dividing cells, 
mutations in kinesin proteins could cause somatic abnormalities or cellular transformation, including 
20 neoplasia. 

Finally, the kinesins could be involved in developmental processes as the localization of 
some moiphogens has been shown to be microtubule-dependent. 

The KIF kinesin superfamily proteins have been identified as candidate motor proteins 
involved in organelle transport. 

25 Among the KIFs, the murine KIF1A protein has been proposed as a transporter of synaptic 

vesicle precursors. KIF1A disruption assays in mice allowed to show that KIF1A is involved in the 
transport of a synaptic vesicle precursor and that KIF1 A-mediated axonal transport plays a critical 
role in viability, maintenance, and function of neurons, particularly mature neurons (Yonekawa et 
al., 1998). The murine KIF1B protein is co-localized with mitochondria in vivo and could be 

30 involved in the transport of mitochondria (Nangaku et al., 1 994). 

SUMMARY OF THE INVENTION 

The present invention pertains to nucleic acid molecules comprising the genomic sequence 
of a novel human gene which encodes a kinesin-like protein and which has been named HKLP by 
the inventors. The HKLP presents homology with murine KIF1A and KIF IB. The HKLP genomic 
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sequence comprises regulatory sequence located downstream (3'-end) of the transcribed portion of 
said gene, these regulatory sequences being also part of the invention. 

The invention also deals with the complete cDNA sequence encoding the HKLP protein, as 
well as with the corresponding translation product. 
5 Oligonucleotide probes or primers hybridizing specifically with a HKLP genomic or cDNA 

sequence are also part of the present invention, as well as DNA amplification and detection methods 
using said primers and probes. 

A further object of the invention consists of recombinant vectors comprising any of the 
nucleic acid sequences described above, and in particular of recombinant vectors comprising a 
10 HKLP regulatory sequence or a sequence encoding a HKLP protein, as well as of cell hosts and 
transgenic non human animals comprising said nucleic acid sequences or recombinant vectors. 

The invention is also directed to biallel.c markers that are located within the HKLP genomic 
sequence or that are in linkage disequilibrium with the HKLP gene, these biallelic markers 
representing useful tools in order to identify a statistically significant association between specific 
1 5 alleles of HKLP gene and diseases, for example cancer and neurological disorders. These 
association methods are within the scope of the invention. 

Finally, the invention is directed to methods for the screening of substances or molecules 
that inhibit the expression of HKLP, as well as with methods for the screening of substances or 
molecules that interact with a HKLP polypeptide or that modulate the activity of a HKLP 
20 polypeptide. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of an exemplary computer system. 

Figure 2 is a flow diagram illustrating one embodiment of a process 200 for comparing a new 
nucleotide or protein sequence with a database of sequences in order to determine the homology levels 
25 between the new sequence and the sequences in the database. 

Figure 3 is a flow diagram illustrating one embodiment of a process 250 in a computer for 
determining whether two sequences are homologous. 

Figure 4 is a flow diagram illustrating one embodiment of an identifier process 300 for 
detecting the presence of a feature in a sequence. 

30 BRIEF DESCRIPTION OF THE SEQUENCES PROVIDED IN THE SEQUENCE 

LISTING 

SEQ ID Nos 1 and 2 contain the genomic sequence of the HKLP gene comprising the exons 
and introns, and the 3' regulatory region (downstream untranscribed region). 
SEQ ID No 3 contains a cDNA sequence of the HKLP gene. 
35 SEQ ID No 4 contains the amino acids sequence encoding by the cDNA of SEQ ID No 3. 
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SEQ ID Nos 5, 6, 7 and 8 respectively contain the nucleotide sequence of the amplicons 10- 
265, 10-266, 12-592 and 12-783. 

SEQ ID No 9 contains a primer containing the additional PU 5* sequence described further 
in Example 2. 

5 SEQ ID No 10 contains a primer containing the additional RP 5' sequence described further 

in Example 2. 

In accordance with the regulations relating to Sequence Listings, the following codes have 
been used in the Sequence Listing to indicate the locations of biallelic markers within the sequences 
and to identify each of the alleles present at the polymorphic base. The code "r" in the sequences 

10 indicates that one allele of the polymorphic base is a guanine, while the other allele is an adenine. 
The code "y" in the sequences indicates that one allele of the polymorphic base is a thymine, while 
the other allele is a cytosine. The code "m" in the sequences indicates that one allele of the 
polymorphic base is an adenine, while the other allele is an cytosine. The code "k" in the sequences 
indicates that one allele of the polymorphic base is a guanine, while the other allele is a thymine. 

1 5 The code "s" in the sequences indicates that one allele of the polymorphic base is a guanine, while 
the other allele is a cytosine. The code "w" in the sequences indicates that one allele of the 
polymorphic base is an adenine, while the other allele is an thymine. 

The nucleotide code of the original allele for each biallelic marker is the following table: 



Biallelic marker 


Original allele 


12-809-119 


C 


12-805-115 


A 


12-790-396 


G 


12-791-211 


G 


12-803-125 


T 


99-33040-321 


T 


12-810-77 


A 


12-787-103 


A 


12-793-383 


T 


12-792-233 


A 


99-41009-244 


A 


99-41009-111 


C 


12-593-174 


T 


12-589-152 


T 


12-785-200 


T 


12-785-393 


A 


12-588-103 


G 


12-603-191 


T 


12-586-414 


G 


12-602-196 


T 


12-602-350 


C 


12-587-379 


A 


12-596-124 


G 


12-808-52 


A 


12-808-75 


G 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention concerns polynucleotides and polypeptides related to the HKLP gene. 
Oligonucleotide probes and primers hybridizing specifically with a genomic or a cDNA sequence of 
HKLP are also part of the invention. A further object of the invention consists of recombinant 
5 vectors comprising any of the nucleic acid sequences described in the present invention, and in 
particular recombinant vectors comprising a regulatory region of HKLP or a sequence encoding the 
HKLP protein, as well as cell hosts comprising said nucleic acid sequences or recombinant vectors. 
The invention also encompasses methods of screening of molecules which inhibit the expression of 
the HKLP gene or which modulate the activity of the HKLP protein. The invention also deals with 
1 0 antibodies directed specifically against such polypeptides that are useful as diagnostic reagents. 

The invention also concerns iZKXP-related biallelic markers which can be used in any 
method of genetic analysis including linkage studies in families, linkage disequilibrium studies in 
populations and association studies of case-control populations. An important aspect of the present 
invention is that biallelic markers allow association studies to be performed to identify genes 
1 5 involved in complex traits. 

Definitions 

Before describing the invention in greater detail, the following definitions are set forth to 
illustrate and define the meaning and scope of the terms used to describe the invention herein. 
The terms " HKLP gene ", when used herein, encompasses genomic, mRNA and cDNA 
20 sequences encoding the HKLP protein, including the untranslated regulatory regions of the genomic 
DNA. 

The term "heterologous protein", when used herein, is intended to designate any protein or 
polypeptide other than the HKLP protein. More particularly, the heterologous protein is a 
compound which can be used as a marker in further experiments with a HKLP regulatory region. 

25 The term "isolated" requires that the material be removed from its original environment (e. 

g., the natural environment if it is naturally occurring). For example, a naturally-occurring 
polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide 
or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, 
is isolated. Such polynucleotide could be part of a vector and/or such polynucleotide or polypeptide 

30 could be part of a composition, and still be isolated in that the vector or composition is not part of its 
natural environment. 

The term "purified" does not require absolute purity; rather, it is intended as a relative 
definition. Purification of starting material or natural material to at least one order of magnitude, 
preferably two or three orders, and more preferably four or five orders of magnitude is expressly 
35 contemplated. As an example, purification from 0.1 % concentration to 10 % concentration is two 
orders of magnitude. The term "purified" is used herein to describe a polynucleotide or 
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polynucleotide vector of the invention which has been separated from other compounds including, 
but not limited to other nucleic acids, carbohydrates, lipids and proteins (such as the enzymes used 
in the synthesis of the polynucleotide), or the separation of covalently closed polynucleotides from 
linear polynucleotides. A polynucleotide is substantially pure when at least about 50%, preferably 
5 60 to 75% of a sample exhibits a single polynucleotide sequence and conformation (linear versus 
covalently close). A substantially pure polynucleotide typically comprises about 50%, preferably 60 
to 90% weight/weight of a nucleic acid sample, more usually about 95%, and preferably is over 
about 99% pure. Polynucleotide purity or homogeneity is indicated by a number of means well 
known in the art, such as agarose or polyacrylamide gel electrophoresis of a sample, followed by 

10 visualizing a single polynucleotide band upon staining the gel. For certain purposes higher 
resolution can be provided by using HPLC or other means well known in the art. 

The term "polypeptide" refers to a polymer of amino acids without regard to the length of 
the polymer, thus, peptides, oligopeptides, and proteins are included within the definition of 
polypeptide. This term also does not specify or exclude post-expression modifications of 

15 polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, 
acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term 
polypeptide. Also included within the definition are polypeptides which contain one or more 
analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids 
which only occur naturally in an unrelated biological system, modified amino acids from 

20 mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications 
known in the art, both naturally occurring and non-naturally occurring. 

The term "recombin ant polypeptide " is used herein to refer to polypeptides that have been 
artificially designed and which comprise at least two polypeptide sequences that are not found as 
contiguous polypeptide sequences in their initial natural environment, or to refer to polypeptides 

25 which have been expressed from a recombinant polynucleotide. 

The term "purified" is used herein to describe a polypeptide of the invention which has been 
separated from other compounds including, but not limited to nucleic acids, lipids, carbohydrates 
and other proteins. A polypeptide is substantially pure when at least about 50%, preferably 60 to 
75% of a sample exhibits a single polypeptide sequence. A substantially pure polypeptide typically 

30 comprises about 50%, preferably 60 to 90% weight/weight of a protein sample, more usually about 
95%, and preferably is over about 99% pure. Polypeptide purity or homogeneity is indicated by a 
number of means well known in the art, such as agarose or polyacrylamide gel electrophoresis of a 
sample, followed by visualizing a single polypeptide band upon staining the gel. For certain 
purposes higher resolution can be provided by using HPLC or other means well known in the ait 

35 As used herein, the term " non-human animal " refers to any non-human vertebrate, birds and 

more usually mammals, preferably primates, farm animals such as swine, goats, sheep, donkeys, and 
horses, rabbits or rodents, more preferably rats or mice. As used herein, the term "animal" is used to 
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refer to any vertebrate, preferable a mammal. Both the terms "animal" and "mammal" expressly 
embrace human subjects unless preceded with the term "non-human". 

As used herein, the term "antibody" refers to a polypeptide or group of polypeptides which 
are comprised of at least one binding domain, where an antibody binding domain is formed from the 
5 folding of variable domains of an antibody molecule to form three-dimensional binding spaces with 
an internal surface shape and charge distribution complementary to the features of an antigenic 
determinant of an antigen, which allows an immunological reaction with the antigen. Antibodies 
include recombinant proteins comprising the binding domains, as wells as fragments, including Fab, 
Fab', F(ab)2, and Ffcb'fc fragments. 

10 As used herein, an "antigenic determinant" is the portion of an antigen molecule, in this case 

a HKLP polypeptide, that determines the specificity of the antigen-antibody reaction. An "epitope" 
refers to an antigenic determinant of a polypeptide. An epitope can comprise as few as 3 amino 
acids in a spatial conformation which is unique to the epitope. Generally an epitope consists of at 
least 6 such amino acids, and more usually at least 8-10 such amino acids. Methods for determining 
15 the amino acids which make up an epitope include x-ray crystallography, 2-dimensional nuclear 
magnetic resonance, and epitope mapping e.g. the Pepscan method described by Geysen et al. 1984; 
PCT Publication No. WO 84/03564; and PCT Publication No. WO 84/03506. 

Throughout the present specification, the expression " nucleotide seq uent" ma y be 
employed to designate indifferently a polynucleotide or a nucleic acid. More precisely, the 
20 expression "nucleotide sequence" encompasses the nucleic material itself and is thus not restricted to 
the sequence information (i.e. the succession of letters chosen among the four base letters) that 
biochemically characterizes a specific DNA or RNA molecule. 

As used interchangeably herein, the terms "nucleic acids ", "oligonucleotides", and 
"polynucleotides" include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide 
25 in either single chain or duplex form. The term "nucleotide" as used herein as an adjective to 

describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single- 
stranded or duplex form. The term "nucleotide" is also used herein as a noun to refer to individual 
nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic 
acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a 
30 phosphate group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or 
polynucleotide. Although the term "nucleotide" is also used herein to encompass "modified 
nucleotides" which comprise at least one modifications (a) an alternative linking group, (b) an 
analogous form of purine, (c) an analogous form of pyrimidine, or (d) an analogous sugar, for 
examples of analogous linking groups, purine, pyrimidines, and sugars see for example PCT 
35 publication No. WO 95/04064. The polynucleotide sequences of the invention may be prepared by 
any known method, including synthetic, recombinant, ex vivo generation, or a combination thereof, 
as well as utilizing any purification methods known in the art. 
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A sequence which is " operably linked " to a regulatory sequence such as a promoter means 
that said regulatory element is in the correct location and orientation in relation to the nucleic acid to 
control RNA polymerase initiation and expression of the nucleic acid of interest. As used herein, the 
term "operably linked" refers to a linkage of polynucleotide elements in a functional relationship. 
5 For instance, a promoter or enhancer is operably linked to a coding sequence if it affects the 
transcription of the coding sequence. 

The terms "fraif 9 and "phenotype" are used interchangeably herein and refer to any visible, 
detectable or otherwise measurable property of an organism such as symptoms of, or susceptibility 
to a disease for example. Typically the terms "trait 9 ' or phenotype" are used herein to refer to 

1 0 symptoms of, or susceptibility to a disease, a beneficial response to or side effects related to a 
treatment Preferably, said trait can be, without to be limited to, cancers, developmental diseases, 
and neurological diseases. 

The term " allele " is used herein to refer to variants of a nucleotide sequence. A biallelic 
polymorphism has two forms. Typically the first identified allele is designated as the original allele 

1 5 whereas other alleles are designated as alternative alleles. Diploid organisms may be homozygous 
or heterozygous for an allelic form. 

The term " heterozygosity rate " is used herein to refer to the incidence of individuals in a 
population which are heterozygous at a particular allele. In a biallelic system, the heterozygosity 
rate is on average equal to 2P a (l -P a ), where P a is the frequency of the least common allele. In order 

20 to be useful in genetic studies, a genetic marker should have an adequate level of heterozygosity to 
allow a reasonable probability that a randomly selected person will be heterozygous. 

The term "genotype " as used herein refers the identity of the alleles present in an individual 
or a sample. In the context of the present invention, a genotype preferably refers to the description 
of the biallelic marker alleles present in an individual or a sample. The term "genotyping" a sample 

25 or an individual for a biallelic marker consists of determining the specific allele or the specific 
nucleotide carried by an individual at a biallelic marker. 

The term " haplotvpe " refers to a combination of alleles present in an individual or a sample. 
In the context of the present invention, a haplotype preferably refers to a combination of biallelic 
marker alleles found in a given individual and which may be associated with a phenotype. 

30 The term " polymorphism " as used herein refers to the occurrence of two or more alternative 

genomic sequences or alleles between or among different genomes or individuals. "Polymorphic" 
refers to the condition in which two or more variants of a specific genomic sequence can be found in 
a population. A "polymorphic site" is the locus at which the variation occurs. A single nucleotide 
polymorphism is the replacement of one nucleotide by another nucleotide at the polymorphic site. 

35 Deletion of a single nucleotide or insertion of a single nucleotide also gives rise to single nucleotide 
polymorphisms. In the context of the present invention, "single nucleotide polymorphism" 
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preferably refers to a single nucleotide substitution. Typically, between different individuals, the 
polymorphic she may be occupied by two different nucleotides. 

The term "biallelic polymorphism" and " biallelic marker" are used interchangeably herein to 
refer to a single nucleotide polymorphism having two alleles at a fairly high frequency in the 
5 population. A "biallelic marker allele" refers to the nucleotide variants present at a biallelic marker 
site. Typically, the frequency of the less common allele of the biallelic markers of the present 
invention has been validated to be greater than 1%, preferably the frequency is greater than 10%, 
more preferably the frequency is at least 20% (i.e. heterozygosity rate of at least 0.32), even more 
preferably the frequency is at least 30% (i.e. heterozygosity rate of at least 0.42). A biallelic marker 
1 0 wherein the frequency of the less common allele is 30% or more is termed a "high quality biallelic 
marker". 

The location of nucleotides in a polynucleotide with respect to the center of the 
polynucleotide are described herein in the following manner. When a polynucleotide has an odd 
number of nucleotides, the nucleotide at an equal distance from the 3' and 5' ends of the 

1 5 polynucleotide is considered to be " at the center " of the polynucleotide, and any nucleotide 
immediately adjacent to the nucleotide at the center, or the nucleotide at the center itself is 
considered to be "within 1 nucleotide of the center." With an odd number of nucleotides in a 
polynucleotide any of the five nucleotides positions in the middle of the polynucleotide would be 
considered to be within 2 nucleotides of the center, and so on. When a polynucleotide has an even 

20 number of nucleotides, there would be a bond and not a nucleotide at the center of the 

polynucleotide. Thus, either of the two central nucleotides would be considered to be "within 1 
nucleotide of the center" and any of the four nucleotides in the middle of the polynucleotide would 
be considered to be "within 2 nucleotides of the center", and so on. For polymorphisms which 
involve the substitution, insertion or deletion of 1 or more nucleotides, the polymorphism, allele or 

25 biallelic marker is "at the center" of a polynucleotide if the difference between the distance from the 
substituted, inserted, or deleted polynucleotides of the polymorphism and the 3' end of the 
polynucleotide, and the distance from the substituted, inserted, or deleted polynucleotides of the 
polymorphism and the 5' end of the polynucleotide is zero or one nucleotide. If this difference is 0 
to 3, then the polymorphism is considered to be "within 1 nucleotide of the center." If the difference 

30 is 0 to 5, the polymorphism is considered to be "within 2 nucleotides of the center." If the difference 
is 0 to 7, the polymorphism is considered to be "within 3 nucleotides of the center," and so on. 

The terms " complementary " or "complement thereof are used herein to refer to the 
sequences of polynucleotides which is capable of forming Watson & Crick base pairing with another 
specified polynucleotide throughout the entirety of the complementary region. For the purpose of the 

35 present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide 
when each base in the first polynucleotide is paired with its complementary base. Complementary 
bases are, generally, A and T (or A and U), or C and G. "Complement" is used herein as a synonym 
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from "complementary polynucleotide", "complementary nucleic acid" and "complementary 
nucleotide sequence". These terms are applied to pairs of polynucleotides based solely upon their 
sequences and not any particular set of conditions under which the two polynucleotides would 
actually bind. 

5 Variants and Fragments 

1- Polynucleotides 

The invention also relates to variants and fragments of the polynucleotides described herein, 
particularly of a HKLP gene containing one or more biallelic markers according to the invention. 

Variants of polynucleotides, as the term is used herein, are polynucleotides that differ from a 

1 0 reference polynucleotide. A variant of a polynucleotide may be a naturally occurring variant such as 
a naturally occurring allelic variant, or it may be a variant that is not known to occur naturally. Such 
non-naturally occurring variants of the polynucleotide may be made by mutagenesis techniques, 
including those applied to polynucleotides, cells or organisms. Generally, differences are limited so 
that the nucleotide sequences of the reference and the variant are closely similar overall and, in many 

15 regions, identical. 

Variants of polynucleotides according to the invention include, without being limited to, 
nucleotide sequences which are at least 95% identical to a polynucleotide selected from the group 
consisting of the nucleotide sequences of SEQ ID Nos 1-3 or to any polynucleotide fragment of at least 
8 consecutive nucleotides of a polynucleotide selected from the group consisting of the nucleotide 

20 sequences of SEQ ID Nos 1-3, and preferably at least 99% identical, more particularly at least 99.5% 
identical, and most preferably at least 99.8% identical to a polynucleotide selected from the group 
consisting of the nucleotide sequences of SEQ ID Nos 1-3 or to any polynucleotide fragment of at least 
8 consecutive nucleotides of a polynucleotide selected from the group consisting of the nucleotide 
sequences of SEQ ID No 1 -3 . 

25 Nucleotide changes present in a variant polynucleotide may be silent, which means that they 

do not alter the amino acids encoded by the polynucleotide. However, nucleotide changes may also 
result in amino acid substitutions, additions, deletions, fusions and truncations in the polypeptide 
encoded by the reference sequence. The substitutions, deletions or additions may involve one or 
more nucleotides. The variants may be altered in coding or non-coding regions or both. Alterations 

30 in the coding regions may produce conservative or non-conservative amino acid substitutions, 
deletions or additions. 

In the context of the present invention, particularly preferred embodiments are those in 
which the polynucleotides encode polypeptides which retain substantially the same biological 
function or activity as the mature HKLP protein, or those in which the polynucleotides encode 

35 polypeptides which maintain or increase a particular biological activity, while reducing a second 
biological activity 
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A polynucleotide fragment is a polynucleotide having a sequence that is entirely the same as 
part but not all of a given nucleotide sequence, preferably the nucleotide sequence of a HKLP gene, 
and variants thereof. The fragment can be a portion of an intron or an exon of a HKLP gene. It can 
also be a portion of the regulatory regions of HKLP. Preferably, such fragments comprise at least 
5 one of the biallelic markers Al to A32 or the complements thereto or a biallelic marker in linkage 
disequilibrium therewith. 

Such fragments may be "free-standing", i.e. not part of or fused to other polynucleotides, or 
they may be comprised within a single larger polynucleotide of which they form a part or region. 
Indeed, several of these fragments may be present within a single larger polynucleotide. 
1 0 Optionally, such fragments may consist of, or consist essentially of a contiguous span of at 

least 8, 10, 12, 15, 18, 20, 25, 35, 40, 50, 70, 80, 100, 250, 500 or 1000 nucleotides in length. 

2- Polypeptides 

The invention also relates to variants, fragments, analogs and derivatives of the polypeptides 
described herein, including mutated HKLP proteins. 

1 5 The variant may be 1 ) one in which one or more of the amino acid residues are substituted 

with a conserved or non-conserved amino acid residue and such substituted amino acid residue may 
or may not be one encoded by the genetic code, or 2) one in which one or more of the amino acid 
residues includes a substituent group, or 3) one in which the mutated HKLP is fused with another 
compound, such as a compound to increase the half-life of the polypeptide (for example, 

20 polyethylene glycol), or 4) one in which the additional amino acids are fused to the mutated HKLP, 
such as a leader or secretory sequence or a sequence which is employed for purification of the 
mutated HKLP or a preprotein sequence. Such variants are deemed to be within the scope of those 
skilled in the art. 

A polypeptide fragment is a polypeptide having a sequence that entirely is the same as part 
25 but not all of a given polypeptide sequence, preferably a polypeptide encoded by a HKLP gene and 
variants thereof. 

In the case of an amino acid substitution in the amino acid sequence of a polypeptide 
according to the invention, one or several amino acids can be replaced by "equivalent" amino acids. 
The expression "equivalent" amino acid is used herein to designate any amino acid that may be 

30 substituted for one of the amino acids having similar properties, such that one skilled in the art of 
peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to 
be substantially unchanged. Generally, the following groups of amino acids represent equivalent 
changes: (1) Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr; (2) Cys, Ser, Tyr, Thr; (3) Val, lie, Leu, 
Met, Ala, Phe; (4) Lys, Arg, His; (5) Phe, Tyr, Trp, His. 

35 A specific embodiment of a modified HKLP peptide molecule of interest according to the 

present invention, includes, but is not limited to, a peptide molecule which is resistant to proteolysis, 
is a peptide in which the -CONH- peptide bond is modified and replaced by a (CH2NH) reduced 
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bond, a (NHCO) retro inverse bond, a (CH2-0) methylene-oxy bond, a (CH2-S) thiomethylene 
bond, a (CH2CH2) carba bond, a (CO-CH2) cetomethylene bond, a (CHOH-CH2) hydroxyethylene 
bond), a (N-N) bound, a E-alcene bond or also a -CH=CH- bond. The invention also encompasses a 
human HKLP polypeptide or a fragment or a variant thereof in which at least one peptide bond has 
5 been modified as described above. 

Such fragments may be "free-standing", i.e. not part of or fused to other polypeptides, or 
they may be comprised within a single larger polypeptide of which they form a part or region. 
However, several fragments may be comprised within a single larger polypeptide. 

As representative examples of polypeptide fragments of the invention, there may be 
1 0 mentioned those which have a contiguous span of at least 6 amino acids, preferably at least 8 to 1 0 
amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids. 

Identity Between Nucleic Acids Or Polypeptides 

The terms "percentage of sequence identity" and "percentage homology" are used 
interchangeably herein to refer to comparisons among polynucleotides and polypeptides, and are 

1 5 determined by comparing two optimally aligned sequences over a comparison window, wherein the 
portion of the polynucleotide or polypeptide sequence in the comparison window may comprise 
additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise 
additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by 
determining the number of positions at which the identical nucleic acid base or amino acid residue 

20 occurs in both sequences to yield the number of matched positions, dividing the number of matched 
positions by the total number of positions in the window of comparison and multiplying the result by 
100 to yield the percentage of sequence identity. Homology is evaluated using any of the variety of 
sequence comparison algorithms and programs known in the art. Such algorithms and programs 
include, but are by no means limited to, TBLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW 

25 (Pearson and Lipman, 1988; Altschu! et al., 1990; Thompson et al., 1994; Higgins et al., 1996; 
Altschul et al., 1990; Altschul et al., 1993). In a particularly preferred embodiment, protein and 
nucleic acid sequence homologies are evaluated using the Basic Local Alignment Search Tool 
("BLAST") which is well known in the art (see, e.g., Karlin and Altschul, 1 990; Altschul et al., 
1990, 1993, 1997). In particular, five specific BLAST programs are used to perform the following 

30 task: 

(1) BLASTP and BLAST3 compare an amino acid query sequence against a protein 
sequence database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence 
database; 

35 (3) BLASTX compares the six-frame conceptual translation products of a query nucleotide 

sequence (both strands) against a protein sequence database; 
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(4) TBLASTN compares a query protein sequence against a nucleotide sequence database 
translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence against 
the six-frame translations of a nucleotide sequence database* 

5 The BLAST programs identify homologous sequences by identifying similar segments, which are 
referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid sequence 
and a test sequence which is preferably obtained from a protein or nucleic acid sequence database. 
High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring matrix, 
many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 matrix 

10 (Gonnet et al., 1992; Henikoff and Henikoff, 1993). Less preferably, the PAM or PAM250 
matrices may also be used (see, e.g., Schwartz and Dayhoff, eds., 1978). The BLAST programs 
evaluate the statistical significance of all high-scoring segment pairs identified, and preferably 
selects those segments which satisfy a user-specified threshold of significance, such as a user- 
specified percent homology. Preferably, the statistical significance of a high-scoring segment pair is 

1 5 evaluated using the statistical significance formula of Karlin (see, e.g., Karlin and Altschul, 1 990). 

Stringent Hybridization Conditions 

For the purpose of defining such a hybridizing nucleic acid according to the invention, the 
stringent hybridization conditions are the followings : 

the hybridization step is realized at 65°C in the presence of 6 x SSC buffer, 5 x Denhardt's 
20 solution, 0,5% SDS and 1 00ng/ml of salmon sperm DNA. 

The hybridization step is followed by four washing steps : 

- two washings during 5 min, preferably at 65°C in a 2 x SSC and 0.1%SDS buffer; 

- one washing during 30 min, preferably at 65°C in a 2 x SSC and 0.1% SDS buffer, 

- one washing during 10 min, preferably at 65°C in a 0. 1 x SSC and 0.1%SDS buffer, 
25 these hybridization conditions being suitable for a nucleic acid molecule of about 20 

nucleotides in length. There is no need to say that the hybridization conditions described above are 
to be adapted according to the length of the desired nucleic acid, following techniques well known to 
the one skilled in the art. The suitable hybridization conditions may for example be adapted 
according to the teachings disclosed in the book of Hames and Higgins (1985). 

30 Genomic Sequences Of The HKI.PClme 

The present invention concerns the genomic sequence of HKLP comprising the 2 genomic 
contigs of SEQ ID Nos 1 and 2. The present invention encompasses HKLP gene, or HKLP genomic 
sequenc s consisting of, consisting essentially of, or comprising a sequence selected from the group 
consisting of SEQ ID Nos 1 and 2, a sequence complementary thereto, as well as fragments and 

35 variants thereof. These polynucleotides may be purified, isolated, or recombinant. 
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The invention also encompasses a purified, isolated, or recombinant polynucleotides 
comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, or 95% nucleotide identity with 
a nucleotide sequence selected from the group consisting of SEQ ID Nos 1 and 2 or a 
complementary sequence thereto or a fragment thereof. The nucleotide differences as regards to the 
5 nucleotide sequences of SEQ ID Nos 1 and 2 may be generally randomly distributed throughout the 
entire nucleic acid. Nevertheless, preferred nucleic acids are those wherein the nucleotide 
differences as regards to the nucleotide sequences of SEQ ID Nos 1 and 2 are predominantly located 
outside the coding sequences contained in the exons. These nucleic acids, as well as their fragments 
and variants, may be used as oligonucleotide primers or probes in order to detect the presence of a 

1 0 copy of the HKLP gene in a test sample, or alternatively in order to amplify a target nucleotide 
sequence within the HKLP sequences. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic acids 
that hybridizes with a nucleotide sequence selected from the group consisting of SEQ ID Nos 1 and 
2 or a complementary sequence thereto or a variant thereof, under the stringent hybridization 

1 5 conditions as defined above. 

Particularly preferred nucleic acids of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, or 200 nucleotides of SEQ ID No 1 or the complements thereof, 
wherein said contiguous span comprises at least 1, 2, 3, 5, 1 0, 20, 30, 40 or 50 of the following 

20 nucleotide positions of SEQ ID No 1: 1-39624, 39705-40589, 40666-43629, 43710-44203, 44311- 
45125, 45210-45440, 45622-45717, 45791-68580, 68675-70246, 70396-72421, 72601-73295, 
73434-74648, 74898-83055, 83175-85192, 85279-85609, 85740-85906, 86070-88304, 88396- 
90585, 90705-91767, 91824-94380, 94490-96296, 96364-97184, 97270-101 167, 101274-109465, 
109581-110228, 110363-1 11819, 1 1 1882-1 13636, 113783-113945, 1 14186-1 17002, 1 17075- 

25 119676, and 119677-121162. 

Additional preferred nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 
80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements thereof, 
wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 

30 positions of SEQ ID No 2: 1-1600, 1751-2138, 2332-2539, 2659-3829 and 8885-10884. 

Additional preferred nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 50, 60, 70, 
80, 90, 1 00, 1 50, 200, 500, or 1 000 nucleotides of SEQ ID No 1 , or 2, or the complements thereof, 
wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide positions of any one of 

35 the following ranges of nucleotide positions of 

(a) SEQ ID No 1: 1-1000, 1001-2000, 2001-3000, 3001-4000, 4001-5000, 5001-6000, 
6001-7000, 7001-8000, 8001-9000, 9001-10000, 10001-1 1000, 11001-12000, 12001-13000, 13001- 
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14000, 14001-15000, 15001-16000, 16001-17000, 17001-18000, 18001-19000, 19001-20000, 
20001-21000, 21001-22000, 22001-23000, 23001-24000, 24001-25000, 25001-26000, 26001- 
27000, 27001-28000, 28001-29000, 29001-30000, 30001-31000, 31001-32000, 32001-33000, 
33001-34000, 34001-35000, 35001-36000, 36001-37000, 37001-38000, 38001-39000, 39001- 
5 39624, 39705-40589, 40666-43629, 43710-44203, 4431 1-45125, 45210-45440, 45622-45717, 
45791-68580, 68675-70246, 70396-72421, 72601-73295, 73434-74648, 74898-83055, 83175- 
85192, 85279-85609, 85740-85906, 86070-88304, 88396-90585, 90705-91767, 91824-94380, 
94490-96296,96364-97184, 97270-101167, 101274-109465, 109581-110228, 110363-111819, 
111882-113636, 113783-113945, 114186-117002, 117075-119676, and 1 19677-121 162; and 

10 (b) SEQ ID No 2: 1-1600, 1751-2138, 2332-2539, 2659-3829 and 8885-10884. 

Additional preferred nucleic acids of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 
80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements thereof, 
wherein said contiguous span comprises a G at position 71 59 of SEQ ID No 1 . Further preferred 

1 5 nucleic acids of the invention include isolated, purified, or recombinant polynucleotides comprising 
a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 
1000 nucleotides of SEQ ID No 2 or the complements thereof, wherein said contiguous span 
comprises a C either at position 255 1 or 4500 of SEQ ID No 2. It should be noted that nucleic acid 
fragments of any size and sequence may also be comprised by the polynucleotides described in this 

20 section. 

The HKLP genomic nucleic acid comprises at least 48 exons. The exon positions in SEQ ID 
Nos 1 and 2 are detailed below in the Table A. The first exon which has been identified in the 
cDNA of the present invention is not comprised in the genomic sequence described in the present 
invention. The sequence of the first exon begins at the position 1 of SEQ ID No 3 and ends at the 

25 position 292. The genomic sequence of SEQ ID Nos 1 and 2 comprises respectively 44 and 4 exons. 

Thus, the invention embodies purified, isolated, or recombinant polynucleotides comprising 
a nucleotide sequence selected from the group consisting of the exons of the HKLP gene, or a 
sequence complementary thereto. The invention also deals with purified, isolated, or recombinant 
nucleic acids comprising a combination of at least two exons of the HKLP gene, wherein the 

30 polynucleotides are arranged within the nucleic acid, from the 5 '-end to the 3 '-end of said nucleic 
acid, in the same order as in SEQ ID Nos 1 and 2. 

The position of the introns is detailed in Table A. Thus, the invention embodies purified, 
isolated, or recombinant polynucleotides comprising a nucleotide sequence selected from the group 
consisting of the introns of the HKLP gene, or a sequence complementary thereto. 

35 Thus, the present invention deals with a purified or isolated nucleic acid encoding a HKLP 

protein having the amino acid sequence of SEQ ID No 4 or a peptide fragment or variant thereof. In 
a specific embodiment, such a purified or isolated nucleic acid comprises a polynucleotide selected 
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from the group consisting of SEQ ID Nos 1 and 2, or a complementary sequence thereto or a 
fragment or a variant thereof 
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The HKLP genomic sequence is covered by two fragments. Indeed, one segment is unknown 
in the intron 44. The inventors think that this segment, which seems to comprise 20 to 30 
nucleotides, forms a superstructure which prevents the sequencing. This superstructure comprises 
two potyG at each end of the segment. 
5 While this section is entitled "Genomic Sequences of HKLP? it should be noted that nucleic 

acid fragments of any size and sequence may also be comprised by the polynucleotides described in 
this section, flanking the genomic sequences of HKLP on either side or between two or more such 
genomic sequences. 

HKLP cDNA Sequences 

10 The expression of the HKLP gene has been shown to lead to the production of at least one 

mRNA species, the nucleic acid sequence of which is set forth in SEQ ID No 3. 

Another object of the invention is a purified, isolated, or recombinant nucleic acid 
comprising the nucleotide sequence of SEQ ID No 3, complementary sequences thereto, as well as 
allelic variants, and fragments thereof. Moreover, preferred polynucleotides of the invention include 

15 purified, isolated, or recombinant HKLP cDNAs consisting of, consisting essentially of, or 
comprising the sequence of SEQ ID No 3. Particularly preferred embodiments of the invention 
include isolated, purified, or recombinant polynucleotides comprising a contiguous span of at least 
12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID 
No 3 or the complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of 

20 the following nucleotide positions of SEQ ID No 3 : 39 1 - 1 6 1 9 and 6988- 1 0682. Additional 

preferred embodiments of the invention include isolated, purified, or recombinant polynucleotides 
comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, 500, or 1 000 nucleotides of SEQ ID No 3 or the complements thereof, wherein said contiguous 
span comprises a nucleotide selected in the group consisting of a C at position 5487, and a C at 

25 position 6265 of SEQ ID No 3. 

The invention also pertains to a purified or isolated nucleic acid having at least 95% of 
nucleotide identity with the nucleotide sequence of SEQ ID No 3 or a fragment thereof or a 
complementary sequence thereto, advantageously 99 %, preferably 99.5% nucleotide identity and 
most preferably 99.8% nucleotide identity with the nucleotide sequence of SEQ ID No 3 or a 

3 0 fragment thereof or a complementary sequence thereto. 

Another object of the invention consists of a purified, isolated, or recombinant nucleic acids 
that hybridizes with the nucleotide sequence of SEQ ID No 3 or a complementary sequence thereto 
or a variant thereof, under the stringent hybridization conditions as defined above. 

The cDNA of SEQ ID No 3 includes a 5'-UTR region starting from the nucleotide at 

35 position 1 and ending at the nucleotide in position 1 86 of SEQ ID No 3. The cDNA of SEQ ID No 3 
includes a 3'-UTR region starting from the nucleotide at position 5638 and ending at the nucleotide 



WO 00/53375 PCI7IB00/00562 

17 

at position 10682 of SEQ ID No 3. The polyadenylation site starts from the nucleotide at position 
1063 1 and ends at the nucleotide in position 10636 of SEQ ID No 3. 

Consequently, the invention concerns a purified, isolated, and recombinant nucleic acids 
comprising a nucleotide sequence of the 3'UTR of the HKLP cDNA, a sequence complementary 
5 thereto, or an allelic variant thereof. 

While this section is entitled " HKLP cDNA Sequences," it should be noted that nucleic acid 
fragments of any size and sequence may also be comprised by the polynucleotides described in this 
section, flanking the genomic sequences of HKLP on either side or between two or more such 
genomic sequences. 

10 Coding Regions 

The HKLP open reading frame is contained in the corresponding mRNA of SEQ ID No 3. 
More precisely, the effective HKLP coding sequence (CDS) includes the region between nucleotide 
position 187 (first nucleotide of the ATG codon) and nucleotide position 5637 (end nucleotide of the 
TGA codon) of SEQ ID No 3. 

15 The present invention also embodies isolated, purified, and recombinant polynucleotides 

which encode a polypeptides comprising a contiguous span of at least 6 amino acids, preferably at 
least 8 or 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of 
SEQ ID No 4, wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the amino acid 
positions 1-478 of the SEQ ID No 4. 

20 The above disclosed polynucleotide that contains the coding sequence of the HKLP gene 

may be expressed in a desired host cell or a desired host organism, when this polynucleotide is 
placed under the control of suitable expression signals. The expression signals may be either the 
expression signals contained in the regulatory regions in the HKLP gene of the invention or in 
contrast the signals may be exogenous regulatory nucleic sequences. Such a polynucleotide, when 

25 placed under the suitable expression signals, may also be inserted in a vector for its expression 
and/or amplification. 

Regulatory Sequences Of HKLP 
As mentioned, the genomic sequence of the HKLP gene contains regulatory sequences in the 
non-coding 3'-flanking region that border the HKLP coding region. The 3'-regulatory sequence of 
30 the HKLP gene is localized between nucleotide position 8885 and nucleotide position 10884 of SEQ 
ID No 2. Polynucleotides derived from the 3' regulatory region are useful in order to detect the 
presence of at least a copy of a nucleotide sequence of SEQ ID No 2 or a fragment thereof in a test 
sample. 

In order to identify the relevant biologically active polynucleotide fragments or variants of 
35 the 3 'regulatory region from SEQ ID No 2, the one skill in the art will refer to the book of Sambrook 
et al.(Sambrook, 1989) which describes the use of a recombinant vector carrying a marker gene (i.e. 
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beta galactosidase, chloramphenicol acetyl transferase, etc.) the expression of which will be detected 
when placed under the control of a biologically active polynucleotide fragments or variants of SEQ 
ID No 2. The level of reporter protein is assayed and compared to the level obtained from a vector 
which lacks an insert in the cloning site. The presence of an elevated expression level in the vector 
5 containing the insert with respect to the control vector indicates the presence of a biologically active 
polynucleotide in the insert 

Polynucleotides carrying the regulatory elements located at the 3' end of the HKLP coding 
region may be advantageously used to control the transcriptional and translational activity of an 
heterologous polynucleotide of interest. 
10 Thus, the present invention also concerns a purified or isolated nucleic acid comprising a 

polynucleotide of the 3 J regulatory regions, or a sequence complementary thereto or a biologically 
active fragment or variant thereof. 

Preferred fragments of the 3' regulatory region are at least 50, 100, 150, 200, 300 or 400 
bases in length. 

1 5 By "biologically active" polynucleotide derivatives of SEQ ID No 2 are polynucleotides 

comprising or alternatively consisting in a fragment of said polynucleotide which is functional as a 
regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide in a 
recombinant cell host. It could act either as an enhancer or as a repressor. 

For the purpose of the invention, a nucleic acid or polynucleotide is "functional" as a 

20 regulatory region for expressing a recombinant polypeptide or a recombinant polynucleotide if said 
regulatory polynucleotide contains nucleotide sequences which contain transcriptional and 
translational regulatory information, and such sequences are "operably linked" to nucleotide 
sequences which encode the desired polypeptide or the desired polynucleotide. 

The regulatory polynucleotides of the invention may be prepared from the nucleotide 

25 sequence of SEQ ID No 2 by cleavage using suitable restriction enzymes, as described for example 
in the book of Sambrook et al.(1989). The regulatory polynucleotides may also be prepared by 
digestion of SEQ ID No 2 by an exonuclease enzyme, such as Bal3 1 (Wabiko et al., 1986). These 
regulatory polynucleotides can also be prepared by nucleic acid chemical synthesis, as described 
elsewhere in the specification. 

30 The regulatory polynucleotides according to the invention may be part of a recombinant 

expression vector that may be used to express a coding sequence in a desired host cell or host 
organism. The recombinant expression vectors according to the invention are described elsewhere 
in the specification. 

A preferred 3'-regulatory polynucleotide of the invention includes the 3*-untranslated region 
35 (3'-UTR) of the HKLP cDNA, or a biologically active fragment or variant thereof. 



WO 00/63375 PCT71B00/00562 

19 

P Ivnucleotide C nstmcts 
The terms "polynucleotide construct" and "recombinant polynucleotide" are used 
interchangeably herein to refer to linear or circular, purified or isolated polynucleotides that have 
been artificially designed and which comprise at least two nucleotide sequences that are not found as 
5 contiguous nucleotide sequences in their initial natural environment. 

DNA Construct That Enables Directing Temporal And Spatial HKLP Gene Expression 
In Recombinant Cell Hosts And In Transgenic Animals. 

In order to study the physiological and phenotypic consequences of a lack of synthesis of the 
HKLP protein, both at the cell level and at the multi cellular organism level, the invention also 

1 0 encompasses DNA constructs and recombinant vectors enabling a conditional expression of a 

specific allele of the HKLP genomic sequence or cDNA and also of a copy of this genomic sequence 
or cDNA harboring substitutions, deletions, or additions of one or more bases as regards to the 
HKLP nucleotide sequence of SEQ ID Nos 1 -3, or a fragment thereof, these base substitutions, 
deletions or additions being located either in an exon, an intron or a regulatory sequence, but 

1 5 preferably in an exon of the HKLP genomic sequence or within the HKLP cDNA of SEQ ID No 3 . 
In a preferred embodiment, the HKLP sequence comprises a biallelic marker of the present 
invention. In a preferred embodiment, the HKLP sequence comprises a biallelic marker of the 
present invention, preferably one of the biallelic markers Al to A32. 

The present invention embodies recombinant vectors comprising any one of the 

20 polynucleotides described in the present invention. 

A first preferred DNA construct is based on the tetracycline resistance operon tet from £. 
coli transposon TnlO for controlling the HKLP gene expression, such as described by Gossen et 
al.(1992, 1995) and Furth et al.(1994). Such a DNA construct contains seven tet operator sequences 
from TnlO (/efop) that are fused to a minimal promoter, said minimal promoter being operably 

25 linked to a polynucleotide of interest that codes either for a sense or an antisense oligonucleotide or 
for a polypeptide, including a HKLP polypeptide or a peptide fragment thereof This DNA construct 
is functional as a conditional expression system for the nucleotide sequence of interest when the 
same cell also comprises a nucleotide sequence coding for either the wild type (tTA) or the mutant 
(rTA) repressor fused to the activating domain of viral protein VP 16 of herpes simplex virus, placed 

30 under the control of a promoter, such as the HCMVIE1 enhancer/promoter or the MMTV-LTR. 
Indeed, a preferred DNA construct of the invention comprise both the polynucleotide containing the 
tet operator sequences and the polynucleotide containing a sequence coding for the tTA or the rTA 
repressor. 

In a specific embodiment, the conditional expression DNA construct contains the sequence 
35 encoding the mutant tetracycline repressor rTA, the expression of the polynucleotide of interest is 
silent in the absence of tetracycline and induced in its presence. 
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DNA Constructs Allowing H m 1 gous Rec mbination: Replacement Vectors 
A second preferred DNA construct will comprise, from 5'-end to 3'-end: (a) a first 
nucleotide sequence that is comprised in the HKLP genomic sequence; (b) a nucleotide sequence 
comprising a positive selection marker, such as the marker for neomycine resistance (neo); and (c) a 
5 second nucleotide sequence that is comprised in the HKLP genomic sequence, and is located on the 
genome downstream the first HKLP nucleotide sequence (a). 

In a preferred embodiment, this DNA construct also comprises a negative selection marker 
located upstream the nucleotide sequence (a) or downstream the nucleotide sequence (c). 
Preferably, the negative selection marker consists of the thymidine kinase (tk) gene (Thomas et al, 
10 1986), the hygromycine beta gene (Te Riele et al., 1990), the hprt gene ( Van derLugtetal., 1991; 
Reidetal., 1 990) or the Diphteria toxin A fragment (Dt-A) gene (Nada et al., 1993; Yagiet 
al.1990). Preferably, the positive selection marker is located within a HKLP exon sequence so as to 
interrupt the sequence encoding a HKLP protein. These replacement vectors are described, for 
example, by Thomas et al.(1986; 1987), Mansour et al.(1988) and Roller et al.(1992). 
15 The first and second nucleotide sequences (a) and (c) may be indifferently located within a 

HKLP regulatory sequence, an intronic sequence, an exon sequence or a sequence containing both 
regulatory and/or intronic and/or exon sequences. The size of the nucleotide sequences (a) and (c) 
ranges from 1 to 50 kb, preferably from 1 to 1 0 kb, more preferably from 2 to 6 kb and most 
preferably from 2 to 4 kb. 

20 DNA Constructs Allowing Homologous Recombination: Cre-LoxP System. 

These new DNA constructs make use of the site specific recombination system of the PI 
phage. The PI phage possesses a recombinase called Cre which interacts specifically with a 34 base 
pairs /oxP site. The loxP site is composed of two palindromic sequences of 13 bp separated by a 8 
bp conserved sequence (Hoess et al., 1986). The recombination by the Cre enzyme between two 

25 loxF sites having an identical orientation leads to the deletion of the DNA fragment. 

The Cre-/<wP system used in combination with a homologous recombination technique has 
been first described by Gu et al.(1993, 1994). Briefly, a nucleotide sequence of interest to be 
inserted in a targeted location of the genome harbors at least two loxP sites in the same orientation 
and located at the respective ends of a nucleotide sequence to be excised from the recombinant 

30 genome. The excision event requires the presence of the recombinase (Cre) enzyme within the 
nucleus of the recombinant cell host. The recombinase enzyme may be brought at the desired time 
either by (a) incubating the recombinant cell hosts in a culture medium containing this enzyme, by 
injecting the Cre enzyme directly into the desired cell, such as described by Araki et al.(1995), or by 
lipofection of the enzyme into the cells, such as described by Baubonis et al.(1993); (b) transfecting 

35 the cell host with a vector comprising the Cre coding sequence operably linked to a promoter 

functional in the recombinant cell host, which promoter being optionally inducible, said vector being 
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introduced in the recombinant cell host, such as described by Gu et al.(1993) and Sauer et al.(1988); 
(c) introducing in the genome of the cell host a polynucleotide comprising the Cre coding sequence 
operably linked to a promoter functional in the recombinant cell host, which promoter is optionally 
inducible, and said polynucleotide being inserted in the genome of the cell host either by a random 
5 insertion event or an homologous recombination event, such as described by Gu et al.(1994). 

In a specific embodiment, the vector containing the sequence to be inserted in the HKLP 
gene by homologous recombination is constructed in such a way that selectable markers are flanked 
by lox? sites of the same orientation, it is possible, by treatment by the Cre enzyme, to eliminate the 
selectable markers while leaving the HKLP sequences of interest that have been inserted by an 
10 homologous recombination event. Again, two selectable markers are needed: a positive selection 
marker to select for the recombination event and a negative selection marker to select for the 
homologous recombination event. Vectors and methods using the Cre-farP system are described by 
Zouetal.(1994). 

Thus, a third preferred DNA construct of the invention comprises, from 5'-end to 3 '-end: (a) 

1 5 a first nucleotide sequence that is comprised in the HKLP genomic sequence; (b) a nucleotide 
sequence comprising a polynucleotide encoding a positive selection marker, said nucleotide 
sequence comprising additionally two sequences defining a site recognized by a recombinase, such 
as a lox? site, the two sites being placed in the same orientation; and (c) a second nucleotide 
sequence that is comprised in the HKLP genomic sequence, and is located on the genome 

20 downstream of the first HKLP nucleotide sequence (a). 

The sequences defining a site recognized by a recombinase, such as a lox? site, are 
preferably located within the nucleotide sequence (b) at suitable locations bordering the nucleotide 
sequence for which the conditional excision is sought. In one specific embodiment, two lox? sites 
are located at each side of the positive selection marker sequence, in order to allow its excision at a 

25 desired time after the occurrence of the homologous recombination event. 

In a preferred embodiment of a method using the third DNA construct described above, the 
excision of the polynucleotide fragment bordered by the two sites recognized by a recombinase, 
preferably two loxP sites, is performed at a desired time, due to the presence within the genome of 
the recombinant host cell of a sequence encoding the Cre enzyme operably linked to a promoter 

30 sequence, preferably an inducible promoter, more preferably a tissue-specific promoter sequence and 
most preferably a promoter sequence which is both inducible and tissue-specific, such as described 
byGuetal.(1994). 

The presence of the Cre enzyme within the genome of the recombinant cell host may result 
of the breeding of two transgenic animals, the first transgenic animal bearing the /«ZP-derived 
35 sequence of interest containing the lox? sites as described above and the second transgenic animal 
bearing the Cre coding sequence operably linked to a suitable promoter sequence, such as described 
byGuetal.(1994). 
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Spatio-temporal control of the Cre enzyme expression may also be achieved with an 
adenovirus based vector that contains the Cre gene thus allowing infection of cells, or in vivo 
infection of organs, for delivery of the Cre enzyme, such as described by Anton and Graham (1995) 
and Kanegae et al.(1995). 
5 The DNA constructs described above may be used to introduce a desired nucleotide 

sequence of the invention, preferably a HKLP genomic sequence or a HKLP cDNA sequence, and 
most preferably an altered copy of a HKLP genomic or cDNA sequence, within a predetermined 
location of the targeted genome, leading either to the generation of an altered copy of a targeted gene 
(knock-out homologous recombination) or to the replacement of a copy of the targeted gene by 
1 0 another copy sufficiently homologous to allow an homologous recombination event to occur (knock- 
in homologous recombination). In a specific embodiment, the DNA constructs described above may 
be used to introduce a HKLP genomic sequence or a HKLP cDNA sequence. Optionally, said 
sequence comprises at least one biallelic marker of the present invention, preferably at least one 
biallelic marker selected from the group consisting of A 1 to A32. 

1 5 Nuclear Antisense DNA Constructs 

Other compositions containing a vector of the invention comprising an oligonucleotide 
fragment of the nucleic sequence SEQ ID No 3, preferably a fragment including the start codon of 
the HKLP gene, as an antisense tool that inhibits the expression of the corresponding HKLP gene. 
Preferred methods using antisense polynucleotide according to the present invention are the 
20 procedures described by Sczakiel et al.( 1 995) or those described in PCT Application No WO 
95/24223. 

Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that 
are complementary to the 5 'end of the HKLP mRNA. In one embodiment, a combination of 
different antisense polynucleotides complementary to different parts of the desired targeted gene are 
25 used. 

Preferred antisense polynucleotides according to the present invention are complementary to 
a sequence of the mRNAs of HKLP that contains either the translation initiation codon ATG or a 
splicing site. Further preferred antisense polynucleotides according to the invention are 
complementary of the splicing site of the HKLP mRNA. 

30 Preferably, the antisense polynucleotides of the invention have a 3' polyadenylation signal 

that has been replaced with a self-cleaving ribozyme sequence, such that RNA polymerase II 
transcripts are produced without poIy(A) at their Y ends, these antisense polynucleotides being 
incapable of export from the nucleus, such as described by Liu et al.(1994). In a preferred 
embodiment, these HKLP antisense polynucleotides also comprise, within the ribozyme cassette, a 

35 histone stem-loop structure to stabilize cleaved transcripts against 3 '-5' exonucleolytic degradation, 
such as the structure described by Eckner et al.(1991). 
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Oligonucleotide Probes And Primers 
Polynucleotides derived from the HKLP gene are useful in order to detect the presence of at 
least a copy of a nucleotide sequence of SEQ ID Nos 1-3, or a fragment, complement, or variant 
thereof in a test sample. 

5 Particularly preferred probes and primers of the invention include isolated, purified, or 

recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 1 50, 200, 500, or 1000 a nucleotide of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 1: 1-39624, 39705-40589, 40666-43629, 43710-44203, 4431 1-45125, 

10 45210-45440, 45622^5717, 45791-68580, 68675-70246, 70396-72421, 72601-73295, 73434- 
74648, 74898-83055, 83 1 75-85 1 92, 85279-85609, 85740-85906, 86070-88304, 88396-90585, 
90705-91767, 91824-94380, 94490-96296, 96364-97184, 97270-101 167, 101274-109465, 109581- 
110228, 110363-111819, 111882-113636, 113783-113945, 114186-117002, 1 17075-1 19676, and 
1 19677-121 162. Additional preferred probes and primers of the invention include isolated, purified, 

15 or recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18,20,25,30,35, 
40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 2: 1-1600, 1751-2138, 2332-2539, 2659-3829 and 8885-10884. 

Additional preferred probes and primers of the invention include isolated, purified, or 

20 recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No lor 2, or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 nucleotide positions of any 
one of the following ranges of nucleotide positions of: 

(a) SEQ ID No 1: 1-1000, 1001-2000, 2001-3000, 3001-4000, 4001-5000, 5001-6000, 

25 6001-7000, 7001-8000, 8001-9000, 9001-10000, 10001-1 1000, 1 1001-12000, 12001-13000, 13001- 
14000, 14001-15000, 15001-16000, 16001-17000, 17001-18000, 18001-19000, 19001-20000, 
20001-21000, 21001-22000, 22001-23000, 23001-24000, 24001-25000, 25001-26000, 26001- 
27000, 27001-28000, 28001-29000, 29001-30000, 30001-31000, 31001-32000, 32001-33000, 
33001-34000, 34001-35000, 35001-36000, 36001-37000, 37001-38000, 38001-39000, 39001- 

30 39624, 39705-40589, 40666-43629, 43710-44203, 4431 1-45125, 45210-45440, 45622-45717, 
45791-68580, 68675-70246, 70396-72421, 72601-73295, 73434-74648, 74898-83055, 83175- 
85192, 85279-85609, 85740-85906, 86070-88304, 88396-90585, 90705-91767, 91824-94380, 
94490-96296, 96364-97184, 97270-101 167, 101274-109465, 109581-110228, 110363-111819, 
111882-113636, 113783-113945, 114186-117002, 117075-1 19676, and 119677-121162; and 

35 (b) SEQ ID No 2: 1-1600, 1751-2138, 2332-2539, 2659-3829 and 8885-10884. 

Additional preferred probes and primers of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 1 8, 20, 25, 30, 35, 40, 
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50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises a G at position 7 1 59 of SEQ ID No 1 . Further 
preferred probes and primers of the invention include isolated, purified, or recombinant 
polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 
5 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 2 or the complements thereof, 
wherein said contiguous span comprises a C either at position 2551 or 4500 of SEQ ID No 2. 

Another preferred probes and primers of the invention include isolated, purified, or 
recombinant polynucleotides comprising a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 
50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 3 or the complements 

10 thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 3: 391-1619 and 6988-10682. Additional preferred probes and primers of 
the invention include isolated, purified, or recombinant polynucleotides comprising a contiguous 
span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 
nucleotides of SEQ ID No 3 or the complements thereof, wherein said contiguous span comprises a 

1 5 nucleotide selected in the group consisting of a C at position 5487, and a C at position 6265 of SEQ 
ID No 3. 

Thus, the invention also relates to nucleic acid probes characterized in that they hybridize 
specifically, under the stringent hybridization conditions defined above, with a nucleic acid selected 
from the group consisting of the nucleotide sequences: 

20 a) 1-39624, 39705-40589, 40666-43629, 43710-44203, 4431 1-45125, 45210-45440, 45622- 

45717, 45791-68580, 68675-70246, 70396-72421, 72601-73295, 73434-74648, 74898-83055, 
83175-85192, 85279-85609, 85740-85906, 86070-88304, 88396-90585, 90705-91767, 91824- 
94380,94490-96296, 96364-97184, 97270-101167, 101274-109465, 109581-1 10228, 1 10363- 
111819, 111882-113636, 113783-113945, 114186-117002, 117075-1 19676, and 119677-121162 of 

25 SEQ ID No 1 or a variant thereof or a sequence complementary thereto; 

b) 1-1600, 1751-2138, 2332-2539, 2659-3829 and 8885-10884 of SEQ ID No 2 or a variant 
thereof or a sequence complementary thereto; and 

c) 391-1619 and 6988-10682 of SEQ ID No 3 or a variant thereof or a sequence 
complementary thereto. 

30 Additionally, another preferred embodiment of a probe according to the invention consists 

of a nucleic acid comprising a biallelic marker selected from the group consisting of Al to A32 or 
the complements thereto, for which the receptive locations in the sequence listing are provided in 
Table 2. 

The invention also relates to a purified and/or isolated nucleotide sequence comprising a 
35 polymorphic base of a i£KZP-related biallelic marker, preferably of a biallelic marker selected from 
the group consisting of A 1 to A32, and the complements thereof. The sequence has between 8 and 
1000 nucleotides in length, and preferably comprises at least 8, 10, 12, 1 5, 1 8, 20, 25, 35, 40, 50, 60, 
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70, 80, 100, 250, 500 or 1 000 contiguous nucleotides, to the extent that such lengths are consistent 
with the specific sequence, of a nucleotide sequence selected from the group consisting of SEQ ID 
Nos 1-3 and 5-8 or a variant thereof or a complementary sequence thereto. In one embodiment the 
invention encompasses isolated, purified, and recombinant polynucleotides comprising, consisting 
5 of, or consisting essentially of a contiguous span of 8 to 50 nucleotides of any one of SEQ ID Nos 1- 
3 and 5-8 and the complement thereof, wherein said span includes a /«2/>-related biallelic marker 
in said sequence; optionally, wherein said /TCL/>-related biallelic marker is selected from the group 
consisting of Al to A32, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, wherein said HKLP- n \*e& biallelic marker is selected from 

10 the group consisting of Al to A22 and A25 to A32, and the complements thereof, or optionally the 
biallelic markers in linkage disequilibrium therewith; optionally, wherein said /«X/>-reIated 
biallelic marker is selected from the group consisting of A23 and A24, and the complements 
thereof, or optionally the biallelic markers in linkage disequilibrium therewith; These nucleotide 
sequences comprise the polymorphic base of either allele 1 or allele 2 of the considered biallelic 

1 5 marker. Optionally, said biallelic marker may be within 6, 5, 4, 3, 2, or 1 nucleotides of the center of 
said polynucleotide or at the center of said polynucleotide; optionally, wherein said contiguous span 
is 1 8 to 35 nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 
polynucleotide; optionally, wherein said polynucleotide consists of said contiguous span and said 
contiguous span is 25 nucleotides in length and said biallelic marker is at the center of said 

20 polynucleotide; optionally, wherein the 3 ( end of said contiguous span is present at the 3' end of said 
polynucleotide; and optionally, wherein the 3' end of said contiguous span is located at the 3' end of 
said polynucleotide and said biallelic marker is present at the 3* end of said polynucleotide. 
Optionally, said polynucleotide may further comprise a label. Optionally, said polynucleotide can 
be attached to solid support. In a further embodiment, the polynucleotides defined above can be 

25 used alone or in any combination. In a preferred embodiment, said probes consists of, or consists 
essentially of a sequence selected from the following sequences: PI to P30 and the complementary 
sequences thereto. 

In another embodiment the invention encompasses isolated, purified and recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a contiguous span of 8 to 50 

30 nucleotides of SEQ ID Nos 1-3 and 5-8 or the complements thereof, wherein the 3' end of said 
contiguous span is located at the 3' end of said polynucleotide, and wherein the 3' end of said 
polynucleotide is located oral least 2, 4, 6, 8, 10, 12, 15, 18, 20, 25, 50, 100, 250, 500, or 1000 
nucleotides upstream of a /flOP-related biallelic marker in said sequence, preferably within within 
20 nucleotides upstream of a /XZJ>-related biallelic marker in said sequence; optionally, wherein 

35 said HKLP-rehAed biallelic marker is selected from the group consisting of Al to A32, and the 
complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith; 
optionally, wherein said /ttZ/>-related biallelic marker is selected from the group consisting of A 1 
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to A22 and A25 to A32, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, wherein said HKLP-re\ated biallelic marker is selected from 
the group consisting of A23 and A24, and the complements thereof, or optionally the biallelic 
markers in linkage disequilibrium therewith; optionally, wherein the 3' end of said polynucleotide 
5 is located 1 nucleotide upstream of said HKLP-related biallelic marker in said sequence; and 
optionally, wherein said polynucleotide consists essentially of a sequence selected from the 
following sequences: Dl to D30 and El to E30. 

In a further embodiment, the invention encompasses isolated, purified, or recombinant 
polynucleotides comprising, consisting of, or consisting essentially of a sequence selected from the 

1 0 following sequences: B 1 to B2S and C I to C25. 

In an additional embodiment, the invention encompasses the use of any polynucleotide for, 
or polynucleotides for use in determining the identity of the nucleotide at a HKLP-related biallelic 
marker or the complements thereof, as well as polynucleotides for use or use of polynucleotides in 
amplifying segments of nucleotides comprising a /^/'.related biallelic marker or the complements 

1 5 thereof; Optionally, said determining may be performed in hybridization assay, sequencing assays, 
and enzyme-based mismatch detection assays; Optionally, said amplifying may be performed by a 
PCR or LCR. optionally, wherein said HKLP-related biallelic marker is selected from the group 
consisting of Al to A32, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, wherein said HKLP-related biallelic marker is selected from 

20 the group consisting of Al to A22 and A25 to A32, and the complements thereof, or optionally the 
biallelic markers in linkage disequilibrium therewith; optionally, wherein said HKLP-related 
biallelic marker is selected from the group consisting of A23 and A24, and the complements 
thereof, or optionally the biallelic markers in linkage disequilibrium therewith; Optionally, said 
polynucleotide may be attached to a solid support, array, or addressable array; Optionally, said 

25 polynucleotide may be labeled. 

The invention concerns the use of the polynucleotides according to the invention for 
determining the identity of the nucleotide at a //K£/>-related biallelic marker, preferably in 
hybridization assay, sequencing assay, microsequencing assay, or an enzyme-based mismatch 
detection assay and in amplifying segments of nucleotides comprising a tfALP-related biallelic 

30 marker. In addition, the polynucleotides of the invention for use or the use of polynucleotides in 
determining the identity of one or more nucleotides at a //KLP-related biallelic marker encompass 
polynucleotides with any further limitation described in this disclosure, or those following, specified 
alone or in any combination. 

The primers and probes can be prepared by any suitable method, including, for example, 

35 cloning and restriction of appropriate sequences and direct chemical synthesis by a method such as 
the phosphodiester method of Narang et al.(1979), the phosphodiester method of Brown et al.(1979), 
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the diethylphosphoramidite method of Beaucage et al.(1981) and the solid support method described 
in EP 0 707 592. The disclosures of all these documents are incorporated herein by reference. 

The formation of stable hybrids depends on the melting temperature (Tm) of the DNA. The 
Tm depends on the length of the primer or probe, the ionic strength of the solution and the G+C 
5 content. The higher the G+C content of the primer or probe, the higher is the melting temperature 
because G:C pairs are held by three H bonds whereas A:T pairs have only two. The GC content in 
the probes of the invention usually ranges between 1 0 and 75 %, preferably between 35 and 60 %, 
and more preferably between 40 and 55 %. 

Detection probes are generally nucleic acid sequences or uncharged nucleic acid analogs 

1 0 such as, for example peptide nucleic acids which are disclosed in International Patent Application 
WO 92/20702, morpholino analogs which are described in U.S. Patents Numbered 5,185,444; 
5,034,506 and 5,142,047. The probe may have to be rendered "non-extendable" in that additional 
dNTPs cannot be added to the probe. In and of themselves analogs usually are non-extendable and 
nucleic acid probes can be rendered non-extendable by modifying the 3' end of the probe such that 

1 5 the hydroxyl group is no longer capable of participating in elongation. For example, the 3' end of 
the probe can be functionalized with the capture or detection label to thereby consume or otherwise 
block the hydroxyl group. Alternatively, the 3* hydroxyl group simply can be cleaved, replaced or 
modified, U.S. Patent Application Serial No. 07/049,061 filed April 19, 1993 describes 
modifications, which can be used to render a probe non-extendable. 

20 A probe or a primer according to the invention has between 8 and 1 000 nucleotides in 

length, or is specified to be at least 12, 15, 18, 20, 25, 35, 40, 50, 60, 70, 80, 100, 250, 500 or 1000 
nucleotides in length. More particularly, the length of these probes and primers can range from 8, 
10, 15, 20, or 30 to 100 nucleotides, preferably from 10 to 50, more preferably from 15 to 30 
nucleotides. Shorter probes and primers tend to lack specificity for a target nucleic acid sequence 

25 and generally require cooler temperatures to form sufficiently stable hybrid complexes with the 
template. Longer probes and primers are expensive to produce and can sometimes self-hybridize to 
form hairpin structures. The appropriate length for primers and probes under a particular set of 
assay conditions may be empirically determined by one of skill in the art. A preferred probe or 
primer consists of a nucleic acid comprising a polynucleotide selected from the group of the 

30 nucleotide sequences of PI to P30 and the complementary sequence thereto, Bl to B25, CI to C25, 
Dl to D30, El to E30, for which the respective locations in the sequence listing are provided in 
Tables 1,2, 3 and 4. 

Any of the polynucleotides of the present invention can be labeled, if desired, by 
incorporating a label detectable by spectroscopic, photochemical, biochemical, immunochemical, or 
35 chemical means. For example, useful labels include radioactive substances ( 32 P, 35 S, 3 H, 125 I), 
fluorescent dyes (5-bromodesoxyuridin, fluorescein, acetylaminofluorene, digoxigenin) or biotin. 
Preferably, polynucleotides are labeled at their 3* and 5' ends. Examples of non-radioactive labeling 
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of nucleic acid fragments are described in the French patent No. FR-78 10975 or by Urdea et al 
(1988) or Sanchez-Pescador et al (1988). In addition, the probes according to the present invention 
may have structural characteristics such that they allow the signal amplification, such structural 
characteristics being, for example, branched DNA probes as those described by Urdea et al. in 1991 
5 or in the European patent No. EP 0 225 807 (Chiron). 

A label can also be used to capture the primer, so as to facilitate the immobilization of either 
the primer or a primer extension product, such as amplified DNA, on a solid support A capture 
label is attached to the primers or probes and can be a specific binding member which forms a 
binding pair with the solid's phase reagent's specific binding member (e.g. biotin and streptavidin). 

10 Therefore depending upon the type of label carried by a polynucleotide or a probe, it may be 
employed to capture or to detect the target DNA. Further, it will be understood that the 
polynucleotides, primers or probes provided herein, may, themselves, serve as the capture label. For 
example, in the case where a solid phase reagent's binding member is a nucleic acid sequence, it 
may be selected such that it binds a complementary portion of a primer or probe to thereby 

1 5 immobilize the primer or probe to the solid phase. In cases where a polynucleotide probe itself 
serves as the binding member, those skilled in the art will recognize that the probe will contain a 
sequence or "tail" that is not complementary to the target. In the case where a polynucleotide primer 
itself serves as the capture label, at least a portion of the primer will be free to hybridize with a 
nucleic acid on a solid phase. DNA Labeling techniques are well known to the skilled technician. 

20 The probes of the present invention are useful for a number of purposes. They can be 

notably used in Southern hybridization to genomic DNA. The probes can also be used to detect 
PCR amplification products. They may also be used to detect mismatches in the HKLP gene or 
mRNA using other techniques. 

Any of the polynucleotides, primers and probes of the present invention can be conveniently 

25 immobilized on a solid support. Solid supports are known to those skilled in the art and include the 
walls of wells of a reaction tray, test tubes, polystyrene beads, magnetic beads, nitrocellulose strips, 
membranes, microparticles such as latex particles, sheep (or other animal) red blood cells, duracytes 
and others. The solid support is not critical and can be selected by one skilled in the art. Thus, latex 
particles, microparticles, magnetic or non-magnetic beads, membranes, plastic tubes, walls of 

30 microtiter wells, glass or silicon chips, sheep (or other suitable animal's) red blood cells and 
duracytes are all suitable examples. Suitable methods for immobilizing nucleic acids on solid 
phases include ionic, hydrophobic, covalent interactions and the like. A solid support, as used 
herein, refers to any material which is insoluble, or can be made insoluble by a subsequent reaction. 
The solid support can be chosen for its intrinsic ability to attract and immobilize the capture reagent. 

35 Alternatively, the solid phase can retain an additional receptor which has the ability to attract and 
immobilize the capture reagent. The additional receptor can include a charged substance that is 
oppositely charged with respect to the capture reagent itself or to a charged substance conjugated to 
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the capture reagent. As yet another alternative, the receptor molecule can be any specific binding 
member which is immobilized upon (attached to) the solid support and which has the ability to 
immobilize the capture reagent through a specific binding reaction. Hie receptor molecule enables 
the indirect binding of the capture reagent to a solid support material before the performance of the 
5 assay or during the performance of the assay. The solid phase thus can be a plastic, derivatized 
plastic, magnetic or non-magnetic metal, glass or silicon surface of a test tube, microtiter well, sheet, 
bead, microparticle, chip, sheep (or other suitable animal's) red blood cells, duracytes® and other 
configurations known to those of ordinary skill in the art. The polynucleotides of the invention can 
be attached to or immobilized on a solid support individually or in groups of at least 2, 5, 8, 10, 12, 

10 15, 20, or 25 distinct polynucleotides of the invention to a single solid support. In addition, 

polynucleotides other than those of the invention may be attached to the same solid support as one or 
more polynucleotides of the invention. 

Consequently, the invention also deals with a method for detecting the presence of a nucleic 
acid comprising a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-3, a 

1 5 fragment or a variant thereof and a complementary sequence thereto in a sample, said method 
comprising the following steps of: 

a) bringing into contact a nucleic acid probe or a plurality of nucleic acid probes which can 
hybridize with a nucleotide sequence included in a nucleic acid selected form the group consisting of 
the nucleotide sequences of SEQ ID Nos 1-3, a fragment or a variant thereof and a complementary 

20 sequence thereto and the sample to be assayed; and 

b) detecting the hybrid complex formed between the probe and a nucleic acid in the sample. 
The invention further concerns a kit for detecting the presence of a nucleic acid comprising a 

nucleotide sequence selected from a group consisting of SEQ ID Nos 1 -3, a fragment or a variant 
thereof and a complementary sequence thereto in a sample, said kit comprising: 
25 a) a nucleic acid probe or a plurality of nucleic acid probes which can hybridize with a 

nucleotide sequence included in a nucleic acid selected form the group consisting of the nucleotide 
sequences of SEQ ID Nos 1-3, a fragment or a variant thereof and a complementary sequence 
thereto; and 

b) optionally, the reagents necessary for performing the hybridization reaction. 

30 In a first preferred embodiment of this detection method and kit, said nucleic acid probe or 

the plurality of nucleic acid probes are labeled with a detectable molecule. In a second preferred 
embodiment of said method and kit, said nucleic acid probe or the plurality of nucleic acid probes 
has been immobilized on a substrate. In a third preferred embodiment, the nucleic acid probe or the 
plurality of nucleic acid probes comprise either a sequence which is selected from the group 

35 consisting of the nucleotide sequences of PI to P30 and the complementary sequence thereto, Bl to 
B25, CI to C25, Dl to D30, El to E30 or a biallelic marker selected from the group consisting of Al 
to A32 and the complements thereto. 
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Olig nucleotide Arrays 

A substrate comprising a plurality of oligonucleotide primers or probes of die invention may 
be used either for detecting or amplifying targeted sequences in the HKLP gene and may also be 
used for detecting mutations in the coding or in the non-coding sequences of the HKLP gene. 
5 Any polynucleotide provided herein may be attached in overlapping areas or at random 

locations on the solid support. Alternatively the polynucleotides of the invention may be attached in 
an ordered array wherein each polynucleotide is attached to a distinct region of the solid support 
which does not overlap with the attachment site of any other polynucleotide. Preferably, such an 
ordered array of polynucleotides is designed to be "addressable" where the distinct locations are 

10 recorded and can be accessed as part of an assay procedure. Addressable polynucleotide arrays 
typically comprise a plurality of different oligonucleotide probes that are coupled to a surface of a 
substrate in different known locations. The knowledge of the precise location of each 
polynucleotides location makes these "addressable" arrays particularly useful in hybridization 
assays. Any addressable array technology known in the art can be employed with the 

1 5 polynucleotides of the invention. One particular embodiment of these polynucleotide arrays is 
known as the Genechips™, and has been generally described in US Patent 5,143,854; PCT 
publications WO 90/15070 and 92/10092. These arrays may generally be produced using 
mechanical synthesis methods or light directed synthesis methods which incorporate a combination 
of photolithographic methods and solid phase oligonucleotide synthesis (Fodor et al., 1991). The 

20 immobilization of arrays of oligonucleotides on solid supports has been rendered possible by the 
development of a technology generally identified as "Veiy Large Scale Immobilized Polymer 
Synthesis" (VLSIPS™) in which, typically, probes are immobilized in a high density array on a 
solid surface of a chip. Examples of VLSIPS™ technologies are provided in US Patents 5,143,854; 
and 5,412,087 and in PCT Publications WO 90/15070, WO 92/10092 and WO 95/1 1995, which 

25 describe methods for forming oligonucleotide arrays through techniques such as light-directed 
synthesis techniques. In designing strategies aimed at providing arrays of nucleotides immobilized 
on solid supports, further presentation strategies were developed to order and display the 
oligonucleotide arrays on the chips in an attempt to maximize hybridization patterns and sequence 
information. Examples of such presentation strategies are disclosed in PCT Publications WO 

30 94/12305, WO 94/1 1530, WO 97/29212 and WO 97/31256. 

In another embodiment of the oligonucleotide an-ays of the invention, an oligonucleotide 
probe matrix may advantageously be used to detect mutations occurring in the HKLP gene and in its 
regulatory region. For this particular purpose, probes are specifically designed to have a nucleotide 
sequence allowing their hybridization to the genes that cany known mutations (either by deletion, 

35 insertion or substitution of one or several nucleotides). By known mutations, it is meant, mutations 
on the HKLP gene that have been identified according, for example to the technique used by Huang 
et al.(1996) or Samson et al.(1996). 
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Another technique that is used to detect mutations in the HKLP gene is the use of a high- 
density DNA array. Each oligonucleotide probe constituting a unit element of the high density DNA 
array is designed to match a specific subsequence of the HKLP genomic DNA or cDNA. Thus, an 
array consisting of oligonucleotides complementary to subsequences of the target gene sequence is 
5 used to determine the identity of the target sequence with the wild gene sequence, measure its 
amount, and detect differences between the target sequence and the reference wild gene sequence of 
the HKLP gene. In one such design, termed 4L tiled array, is implemented a set of four probes (A, 
C, G, T), preferably 1 5-nucleotide oligomers. In each set of four probes, the perfect complement 
will hybridize more strongly than mismatched probes. Consequently, a nucleic acid target of length 
10 L is scanned for mutations with a tiled array containing 4L probes, the whole probe set containing all 
the possible mutations in the known wild reference sequence. The hybridization signals of the 15- 
mer probe set tiled array are perturbed by a single base change in the target sequence. As a 
consequence, there is a characteristic loss of signal or a "footprint" for the probes flanking a 
mutation position. This technique was described by Chee et al. in 1996, which is herein 
1 5 incorporated by reference. 

Consequently, the invention concerns an array of nucleic acid molecules comprising at least 
one polynucleotide described above as probes and primers. Preferably, the invention concerns an 
array of nucleic acid comprising at least two polynucleotides described above as probes and primers. 

A further object of the invention consists of an array of nucleic acid sequences comprising 
20 either at least one of the sequences selected from the group consisting of PI to P30, Bl to B25, CI to 
C25, Dl to D30, El to E30, the sequences complementary thereto, a fragment thereof of at least 8 
consecutive nucleotides thereof, and at least one sequence comprising a biallelic marker selected 
from the group consisting of A 1 to A32 and the complements thereto. 

The invention also pertains to an array of nucleic acid sequences comprising either at least 
25 two of the sequences selected from the group consisting of PI to P30, Bl to B25, CI to C25, Dl to 
D30, El to E30, the sequences complementary thereto, a fragment thereof of at least 8 consecutive 
nucleotides thereof, and at least two sequences comprising a biallelic marker selected from the group 
consisting of A 1 to A32 and the complements thereof. 

Amplification of the HKLP gene. 

30 1. DNA extraction 

As for the source of the genomic DNA to be subjected to analysis, any test sample can be 
foreseen without any particular limitation. These test samples include biological samples which can 
be tested by the methods of the present invention described herein and include human and animal 
body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine, lymph fluids, and 
35 various external secretions of the respiratory, intestinal and genitourinary tracts, tears, saliva, milk, 
white blood cells, myelomas and the like; biological fluids such as cell culture supernatants; fixed 
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tissue specimens including tumor and non-tumor tissue and lymph node tissues; bone marrow 
aspirates and fixed cell specimens. The preferred source of genomic DNA used in the context of the 
present invention is from peripheral venous blood of each donor. 

The techniques of DNA extraction are well-known to the skilled technician. Such 
5 techniques are described notably by Mackey et al. ( 1 998). 

2. DNA amplification 

DNA amplification techniques are well-known to those skilled in the art. Amplification 
techniques that can be used in the context of the present invention include, but are not limited to, the 
ligase chain reaction (LCR) described in EP-A- 320 308, WO 9320227 and EP-A-439 1 82, the 

10 disclosures of which are incorporated herein by reference, the polymerase chain reaction (PCR, RT- 
PCR) and techniques such as the nucleic acid sequence based amplification (NASBA) described in 
Guatelli JC, et al. (1990) and in Compton J. (1991), Q-beta amplification as described in European 
Patent Application no 4544610, strand displacement amplification as described in Walker et al. 
(1996) and EP A 684 315 and, target mediated amplification as described in PCT Publication WO 

1 5 932246 1 , the disclosure of which is incorporated herein by reference. 

LCR and Gap LCR are exponential amplification techniques, both depend on DNA ligase to 
join adjacent primers annealed to a DNA molecule. In Ligase Chain Reaction (LCR), probe pairs are 
used which include two primary (first and second) and two secondary (third and fourth) probes, all 
of which are employed in molar excess to target. The first probe hybridizes to a first segment of the 

20 target strand and the second probe hybridizes to a second segment of the target strand, the first and 
second segments being contiguous so that the primary probes abut one another in 5' phosphate- 
3'hydroxyl relationship, and so that a ligase can covalently fuse or ligate the two probes into a fused 
product. In addition, a third (secondary) probe can hybridize to a portion of the first probe and a 
fourth (secondaiy) probe can hybridize to a portion of the second probe in a similar abutting fashion. 

25 Of course, if the target is initially double stranded, the secondaiy probes also will hybridize to the 
target complement in the first instance. Once the ligated strand of primary probes is separated from 
the target strand, it will hybridize with the third and fourth probes which can be ligated to form a 
complementary, secondary ligated product. It is important to realize that the ligated products are 
functionally equivalent to either the target or its complement. By repeated cycles of hybridization 

30 and ligation, amplification of the target sequence is achieved. A method for multiplex LCR has also 
been described (WO 9320227). Gap LCR (GLCR) is a version of LCR where the probes are not 
adjacent but are separated by 2 to 3 bases. 

For amplification of mRNAs, it is within the scope of the present invention to reverse 
transcribe mRNA into cDNA followed by polymerase chain reaction (RT-PCR); or, to use a single 

35 enzyme for both steps as described in U.S. Patent No. 5,322,770 or, to use Asymmetric Gap LCR 
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(RT-AGLCR) as described by Marshall et al. (1994). AGLCR is a modification of GLCR that 
allows the amplification of RNA. 

The PCR technology is the preferred amplification technique used in the present invention. 
A variety of PCR techniques are familiar to those skilled in the art For a review of PCR 
5 technology, see White (1997) and the publication entitled "PCR Methods and Applications" (1991, 
Cold Spring Harbor Laboratoiy Press). In each of these PCR procedures, PCR primers on either 
side of the nucleic acid sequences to be amplified are added to a suitably prepared nucleic acid 
sample along with dNTPs and a thermostable polymerase such as Taq polymerase, Pfu polymerase, 
or Vent polymerase. The nucleic acid in the sample is denatured and the PCR primers are 

1 0 specifically hybridized to complementary nucleic acid sequences in the sample. The hybridized 
primers are extended. Thereafter, another cycle of denaturation, hybridization, and extension is 
initiated. The cycles are repeated multiple times to produce an amplified fragment containing the 
nucleic acid sequence between the primer sites. PCR has fiirther been described in several patents 
including US Patents 4,683,195, 4,683,202 and 4,965,188. Each of these publications is 

15 incorporated by reference. 

One of the aspects of the present invention is a method for the amplification of the human 
HKLP gene, particularly of the genomic sequences of SEQ ID No 1 and 2 or of the cDNA sequence 
of SEQ ID No 3, or a fragment or a variant thereof in a test sample, preferably usingthe PCR 
technology. The method comprises the steps of contacting a test sample suspected of containing the 

20 target HKLP encoding sequence or portion thereof with amplification reaction reagents comprising a 
pair of amplification primers, and eventually in some instances a detection probe that can hybridize 
with an internal region of amplicon sequences to confirm that the desired amplification reaction has 
taken place. 

Thus, the present invention also relates to a method for the amplification of a human HKLP 
25 gene sequence, particularly of a portion of the genomic sequences of SEQ ID Nos 1 and 2 or of the 
cDNA sequence of SEQ ID No 3, or a variant thereof in a test sample, said method comprising the 
steps of: 

a) contacting a test sample suspected of containing the targeted HKLP gene sequence 
comprised in a nucleotide sequence selected from a group consisting of SEQ ID Nos 1-3, or 

30 fragments or variants thereof with amplification reaction reagents comprising a pair of amplification 
primers as described above and located on either side of the polynucleotide region to be amplified; 
and 

b) optionally, detecting the amplification products. 

The invention also concerns a kit for the amplification of a human HKLP gene sequence, 
35 particularly of a portion of the genomic sequences of SEQ ID No 1 and 2 or of the cDNA sequence 
of SEQ ID No 3, or a variant thereof in a test sample, wherein said kit comprises: 
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a) a pair of oligonucleotide primers located on either side of the HKLP region to be 
amplified; and 

b) Optionally, the reagents necessary for performing the amplification reaction. 

In a first preferred embodiment of the above amplification method or kit, the amplification 
i product is detected by hybridization with a labeled probe having a sequence which is complementary 
to the amplified region. 

The primers are more particularly characterized in that they have sufficient complementarity 
with any sequence of a strand of the genomic sequence close to the region to be amplified, for 
example with a non-coding sequence adjacent to exons to amplify. 

In a second preferred embodiment, the nucleic acid primers comprise a sequence which is 
selected from the group consisting of the nucleotide sequences of B 1 to B25, C 1 to C25, Dl to D30, 
andEltoE30. 



HKT,P Prnte fas and Polypeptide Fragments: 
The term "HKLP polypeptides" is used herein to embrace all of the proteins and 

1 5 polypeptides of the present invention. Also forming part of the invention are polypeptides encoded 
by the polynucleotides of the invention, as well as fusion polypeptides comprising such 
polypeptides. The invention embodies HKLP proteins from humans, including isolated or purified 
HKLP proteins consisting, consisting essentially, or comprising the sequence of SEQ ID No 4. The 
HKLP protein has 1816 amino acids in length. The 700 first amino acids of the HKLP protein 

20 present 97 % of homology with the murine KIF1B protein (Nangaku et ah, 1994). The HKLP 

protein presents 60-70% of homology with the murine KIF1A protein, and more particularly the 390 
first amino acids of the HKLP protein have 85 % of homology therewith. 

The present invention embodies isolated, purified, and recombinant polypeptides comprising 
a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably 

25 at least 1 2, 1 5, 20, 25, 30, 40, 50, or 1 00 amino acids of SEQ ID No 4, wherein said contiguous span 
includes at least 1, 2, 3, 5 or 10 of the amino acid positions 1-478 of the SEQ ID No 4. In other 
preferred embodiments the contiguous stretch of amino acids comprises the site of a mutation or 
functional mutation, including a deletion, addition, swap or truncation of the amino acids in the 
HKLP protein sequence. 

30 The invention also encompasses a purified, isolated, or recombinant polynucleotides 

comprising a nucleotide sequence having at least 70, 75, 80, 85, 90, 95, 98 or 99% nucleotide 
identity with a nucleotide sequence of SEQ ED No 4 or a complementary sequence thereto or a 
fragment thereof. 

HKLP proteins are preferably isolated from human or mammalian tissue samples or 
35 expressed from human or mammalian genes. The HKLP polypeptides of the invention can be made 
using routine expression methods known in the art. The polynucleotide encoding the desired 



WO 00/63375 PCT7IB00/00562 

35 

polypeptide is ligated into an expression vector suitable for any convenient host. Both eukaryotic 
and prokaryotic host systems is used in forming recombinant polypeptides, and a summary of some 
of the more common systems. The polypeptide is then isolated from lysed cells or from the culture 
medium and purified to the extent needed for its intended use. Purification is by any technique 
5 known in the art, for example, differential extraction, salt fractionation, chromatography, 

centrifugation, and the like. See, for example, Methods in Enzymology for a variety of methods for 
purifying proteins. 

In addition, shorter protein fragments is produced by chemical synthesis. Alternatively the 
proteins of the invention is extracted from cells or tissues of humans or non-human animals. 
10 Methods for purifying proteins are known in the art, arid include the use of detergents or chaotropic 
agents to disrupt particles followed by differential extraction and separation of the polypeptides by 
ion exchange chromatography, affinity chromatography, sedimentation according to density, and gel 
electrophoresis. 

Any HKLP cDNA, including SEQ ID No 3, is used to express HKLP proteins and 
1 5 polypeptides. The nucleic acid encoding the HKLP protein or polypeptide to be expressed is operably 
linked to a promoter in an expression vector using conventional cloning technology. The HKLP insert 
in the expression vector may comprise the full coding sequence for the HKLP protein or a portion 
thereof. For example, the HKLP derived insert may encode a polypeptide comprising at least 10 
consecutive amino acids of the HKLP protein of SEQ ID No 4, wherein said contiguous span includes 

20 at least 1, 2, 3, 5 or 1 0 of the amino acid positions 1 -478 of the SEQ ID No 4. 

The expression vector is any of the mammalian, yeast, insect or bacterial expression systems 
known in the art. Commercially available vectors and expression systems are available from a variety 
of suppliers including Genetics Institute (Cambridge, MA), Stratagene (La Jolla, California), Promega 
(Madison, Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and 

25 facilitate proper protein folding, the codon context and codon pairing of the sequence is optimized for 
the particular expression organism in which the expression vector is introduced, as explained by 
Hatfield, et al, U.S. Patent No. 5,082,767. 

In one embodiment, the entire coding sequence of the HKLP cDNA through the poly A signal 
of the cDNA are operably linked to a promoter in the expression vector. Alternatively, if the nucleic 

30 acid encoding a portion of the HKLP protein lacks a methionine to serve as the initiation site, an 
initiating methionine can be introduced next to the first codon of the nucleic acid using conventional 
techniques. Similarly, if the insert from the HKLP cDNA lacks a poly A signal, this sequence can be 
added to the construct by, for example, splicing out the Poly A signal from pSG5 (Stratagene) using 
Bgll and Sail restriction endonuclease enzymes and incorporating it into the mammalian expression 

35 vector pXTl (Stratagene). pXTl contains the LTRs and a portion of the gag gene from Moloney 
Murine Leukemia Virus. The position of the LTRs in the construct allow efficient stable transfection. 
The vector includes the Herpes Simplex Thymidine Kinase promoter and the selectable neomycin gene. 
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The nucleic acid encoding the HKLP protein or a portion thereof is obtained by PCR from a bacterial 
vector containing the HKLP cDNA of SEQ ID No 3 using oligonucleotide primers complementary to 
the HKLP cDNA or portion thereof and containing restriction endonuclease sequences for Pst I 
incorporated into the 5'primer and BgUI at the 5' end of the corresponding cDNA 3' primer, taking care 
5 to ensure that the sequence encoding the HKLP protein or a portion thereof is positioned properly with 
respect to the poly A signal. The purified fragment obtained from the resulting PCR reaction is digested 
with PstI, blunt ended with an exonuclease, digested with Bgl II, purified and ligated to pXTl, now 
containing a poly A signal and digested with BglH. 

The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life 
1 0 Technologies, Inc., Grand Island, New York) under conditions outlined in the product specification. 
Positive transfectants are selected after growing the transfected cells in 600ug/ml G41 8 (Sigma, St 
Louis, Missouri). 

Alternatively, the nucleic acids encoding the HKLP protein or a portion thereof is cloned into 
pED6dpc2 (Genetics Institute, Cambridge, MA). The resulting P ED6dpc2 constructs is transfected into 

1 5 a suitable host cell, such as COS 1 cells. Methotrexate resistant cells are selected and expanded. 

The above procedures may also be used to express a mutant HKLP protein responsible for a 
detectable phenotype or a portion thereof. 

The expressed proteins are purified using conventional purification techniques such as 
ammonium sulfate precipitation or chromatographic separation based on size or charge. The protein 

20 encoded by the nucleic acid insert may also be purified using standard immunochromatography 
techniques. In such procedures, a solution containing the expressed HKLP protein or portion thereof, 
such as a cell extract, is applied to a column having antibodies against the HKLP protein or portion 
thereof is attached to the chromatography matrix. The expressed protein is allowed to bind the 
immunochromatography column. Thereafter, the column is washed to remove non-specifically bound 

25 proteins. The specifically bound expressed protein is then released from the column and recovered 
using standard techniques. 

To confirm expression of the HKLP protein or a portion thereof, the proteins expressed from 
host cells containing an expression vector containing an insert encoding the HKLP protein or a portion 
thereof can be compared to the proteins expressed in host cells containing the expression vector without 

30 an insert The presence of a band in samples from cells containing the expression vector with an insert 
which is absent in samples from cells containing the expression vector without an insert indicates that 
the HKLP protein or a portion thereof is being expressed. Generally, the band will have the mobility 
expected for the HKLP protein or portion thereof. However, the band may have a mobility different 
than that expected as a result of modifications such as glycosylate, ubiquitination, or enzymatic 

35 cleavage. 

Antibodies capable of specifically recognizing the expressed HKLP protein or a portion thereof 
are described below. 
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If antibody production is not possible, the nucleic acids encoding the HKLP protein or a portion 
thereof is incorporated into expression vectors designed for use in purification schemes employing 
chimeric polypeptides. In such strategies the nucleic acid encoding the HKLP protein or a portion 
thereof is inserted in frame with the gene encoding the other half of the chimera. The other half of the 
5 chimera is p-globin or a nickel binding polypeptide encoding sequence. A chromatography matrix 
having antibody to p-globin or nickel attached thereto is then used to purify the chimeric protein. 
Protease cleavage sites is engineered between the P-globin gene or the nickel binding polypeptide and 
the HKLP protein or portion thereof. Thus, the two polypeptides of the chimera is separated from one 
another by protease digestion. 

1 0 One useful expression vector for generating P-globin chimerics is pSG5 (Stratagene), which 

encodes rabbit p-globin. Intron II of the rabbit p-globin gene facilitates splicing of the expressed 
transcript, and the polyadenylation signal incorporated into the construct increases the level of 
expression. These techniques are well known to those skilled in the art of molecular biology. Standard 
methods are published in methods texts such as Davis et al., (1986) and many of the methods are 

1 5 available from Stratagene, Life Technologies, Inc., or Promega. Polypeptide may additionally be 

produced from the construct using in vitro translation systems such as the In vitro Express™ Translation 
Kit (Stratagene). 

Thus, the present invention also concerns a method for producing one of the polypeptides 
described herein, and especially a polypeptide of SEQ ID No 4 or a fragment or a variant thereof, 
20 wherein said method comprises the steps of : 

a) culturing, in an appropriate culture medium, a cell host previously transformed or 
transfected with the recombinant vector comprising a nucleic acid encoding a HKLP polypeptide, or 
a fragment or a variant thereof; 

b) harvesting the culture medium thus conditioned or lyse the cell host, for example by 
25 sonication or by an osmotic shock; 

c) separating or purifying, from the said culture medium, or from the pellet of the resultant 
host cell lysate the thus produced polypeptide of interest. 

d) Optionally characterizing the produced polypeptide of interest. 

In a specific embodiment of the above method, step a) is preceded by a step wherein the 
30 nucleic acid coding for a HKLP polypeptide, or a fragment or a variant thereof, is inserted in an 
appropriate vector, optionally after an appropriate cleavage of this amplified nucleic acid with one or 
several restriction endonucleases. The nucleic acid coding for a HKLP polypeptide or a fragment or 
a variant thereof may be the resulting product of an amplification reaction using a pair of primers 
according to the invention (by SDA, TAS, 3SR NASBA, TMA etc.). 
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Antibodies That Bind HKTP Polypeptides fthelnventi n 
Any HKLP polypeptide or whole protein may be used to generate antibodies capable of 
specifically binding to expressed HKLP protein or fragments thereof as described. The antibody 
compositions of the invention are capable of specifically binding or specifically bind to the HKLP 
5 protein. For an antibody composition to specifically bind to the HKLP protein it must demonstrate 
at least a 5%, 10%, 15%, 20%, 25%, 50%, or 100% greater binding affinity for full length HKLP 
protein than for any full length protein in an ELISA, RIA, or other antibody-based binding assay. 

In a preferred embodiment of the invention antibody compositions are capable of selectively 
binding, or selectively bind to an epitope-containing fragment of a polypeptide comprising a 
10 contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 4, wherein said epitope comprises 
at least 1, 2, 3, 5 or 10 of the amino acid positions 1-478 of the SEQ ID No 4, wherein said antibody 
composition is optionally either polyclonal or monoclonal. 

The present invention also contemplates the use of polypeptides comprising a contiguous 
1 5 span of at least 6 amino acids, preferably at least 8 to 1 0 amino acids, more preferably at least 12, 
15, 20, 25, 50, or 100 amino acids of a HKLP polypeptide in the manufacture of antibodies, wherein 
said contiguous span comprises at least 1, 2, 3, 5 or 10 of the amino acid positions 1-478 of the SEQ 
ID No 4. In a preferred embodiment such polypeptides are useful in the manufacture of antibodies 
to detect the presence and absence of the HKLP protein. 
20 Non-human animals or mammals, whether wild-type or transgenic, which express a different 

species of HKLP than the one to which antibody binding is desired, and animals which do not 
express HKLP (i.e. a HKLP knock out animal as described in herein) are particularly useful for 
preparing antibodies. HKLP knock out animals will recognize all or most of the exposed regions of 
HKLP as foreign antigens, and therefore produce antibodies with a wider array of HKLP epitopes. 
25 Moreover, smaller polypeptides with only 10 to 30 amino acids may be useful in obtaining specific 
binding to the HKLP protein. In addition, the humoral immune system of animals which produce a 
species of HKLP that resembles the antigenic sequence will preferentially recognize the differences 
between the animal's native HKLP species and the antigen sequence, and produce antibodies to 
these unique sites in the antigen sequence. Such a technique will be particularly useful in obtaining 
30 antibodies that specifically bind to the HKLP protein. 

Antibody preparations prepared according to either protocol are useful in quantitative 
immunoassays which determine concentrations of antigen-bearing substances in biological samples; 
they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological 
sample. The antibodies may also be used in therapeutic compositions for killing cells expressing the 
35 protein or reducing the levels of the protein in the body. 

The antibodies of the invention may be labeled, either by a radioactive, a fluorescent or an 
enzymatic label. 



WO 00/63375 PCT/IBOO/00562 

39 

Consequently, the invention is also directed to a method for detecting specifically the 
presence of a human HKLP polypeptide according to the invention in a biological sample, said 
method comprising the following steps : 

a) bringing into contact the biological sample with a polyclonal or monoclonal antibody 

5 directed against the HKLP polypeptide of the amino acid sequence of SEQ ID No 4, or to a peptide 
fragment or variant thereof, 

b) detecting the antigen-antibody complex formed. 

The invention also concerns a diagnostic kit for detecting in vitro the presence of a human 
HKLP polypeptide according to the present invention in a biological sample, wherein said kit 
10 comprises: 

a) a polyclonal or monoclonal antibody directed against the HKLP polypeptide of the amino 
acid sequence of SEQ ID No 4, or to a peptide fragment or variant thereof, optionally labeled; 

b) a reagent allowing the detection of the antigen-antibody complexes formed, said reagent 
carrying optionally a label, or being able to be recognized itself by a labeled reagent, more 

1 5 particularly in the case when the above-mentioned monoclonal or polyclonal antibody is not labeled 
by itself. 

HKLP -related Biallelic Markers 

Advantages Of The Biallelic Markers Of The Present Invention 

The /ttZP-related biallelic markers of the present invention offer a number of important 
20 advantages over other genetic markers such as RFLP (Restriction fragment length polymorphism) 
and VNTR (Variable Number of Tandem Repeats) markers. 

The first generation of markers, were RFLPs, which are variations that modify the length of 
a restriction fragment. But methods used to identify and to type RFLPs are relatively wasteful of 
materials, effort, and time. The second generation of genetic markers were VNTRs, which can be 
25 categorized as either minisatellites or microsatellites. Minisatellites are tandemly repeated DNA 
sequences present in units of 5-50 repeats which are distributed along regions of the human 
chromosomes ranging from 0.1 to 20 kilobases in length. Since they present many possible alleles, 
their informative content is very high. Minisatellites are scored by performing Southern blots to 
identify the number of tandem repeats present in a nucleic acid sample from the individual being 
30 tested. However, there are only 1 0* potential VNTRs that can be typed by Southern blotting. 
Moreover, both RFLP and VNTR markers are costly and time-consuming to develop and assay in 
large numbers. 

Single nucleotide polymorphism or biallelic markers can be used in the same manner as 
RFLPs and VNTRs but offer several advantages. SNP are densely spaced in the human genome and 
35 represent the most frequent type of variation. An estimated number of more than 1 0 7 sites are 
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scattered along the 3xl0 9 base pairs of the human genome. Therefore, SNP occur at a greater 
frequency and with greater uniformity than RFLP or VNTR markers which means that there is a 
greater probability that such a marker will be found in close proximity to a genetic locus of interest 
SNP are less variable than VNTR markers but are mutationally more stable. 
5 Also, the different forms of a characterized single nucleotide polymorphism, such as the 

bi allelic markers of the present invention, are often easier to distinguish and can therefore be typed 
easily on a routine basis. Biallelic markers have single nucleotide based alleles and they have only 
two common alleles, which allows highly parallel detection and automated scoring. The biallelic 
markers of the present invention offer the possibility of rapid, high throughput genotyping of a large 

10 number of individuals. 

Biallelic markers are densely spaced in the genome, sufficiently informative and can be 
assayed in large numbers. The combined effects of these advantages make biallelic markers 
extremely valuable in genetic studies. Biallelic markers can be used in linkage studies in families, in 
allele sharing methods, in linkage disequilibrium studies in populations, in association studies of 

1 5 case-control populations or of trait positive and trait negative populations. An important aspect of 
the present invention is that biallelic markers allow association studies to be performed to identify 
genes involved in complex traits. Association studies examine the frequency of marker alleles in 
unrelated case- and control-populations and are generally employed in the detection of polygenic or 
sporadic traits. Association studies may be conducted within the general population and are not 

20 limited to studies performed on related individuals in affected families (linkage studies). Biallelic 
markers in different genes can be screened in parallel for direct association with disease or response 
to a treatment. This multiple gene approach is a powerful tool for a variety of human genetic studies 
as it provides the necessary statistical power to examine the synergistic effect of multiple genetic 
factors on a particular phenotype, drug response, sporadic trait, or disease state with a complex 

25 genetic etiology. 

Candidate Gene Of The Present Invention 

Different approaches can be employed to perform association studies: genome-wide 
association studies, candidate region association studies and candidate gene association studies. 
Genome-wide association studies rely on the screening of genetic markers evenly spaced and 

30 covering the entire genome. The candidate gene approach is based on the study of genetic markers 
specifically located in genes potentially involved in a biological pathway related to the trait of 
interest. In the present invention, HKLP is the candidate gene. The candidate gene analysis clearly 
provides a short-cut approach to the identification of genes and gene polymorphisms related to a 
particular trait when some information concerning the biology of the trait is available. However, it 

35 should be noted that all of the biallelic markers disclosed in the instant application can be employed 
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as part of genome-wide association studies or as part of candidate region association studies and 
such uses are specifically contemplated in the present invention and claims. 

flKLF-Related Biallelic Markers And P lynucleotides Related Thereto 

The invention also concerns fffiZP-related biallelic markers. As used herein the term 
5 "i/KZP-related biallelic marker" relates to a set of biallelic markers in linkage disequilibrium with 
the HKLP gene. The term /ZKLP-related biallelic marker includes the biallelic markers designated 
Al toA32. 

A portion of the biallelic markers of the present invention are disclosed in Table 2. Their 
location on the HKLP gene is indicated in Table 2 and also as a single base polymorphism in the 
10 features of in the related SEQ ID Nos 1-3 and 5-8. The pairs of primers allowing the amplification 
of a nucleic acid containing the polymorphic base of one HKLP biallelic marker are listed in Table 1 
of Example 2. 

27 /KLP-related biallelic markers, A I to A27, are located in the genomic sequence of 
HKLP. Four of them are located in exonic sequence, namely Al, A23, A24 and A25. The other 

15 //KLP-related biallelic markers are located in intronic region of HKLP. Additionally, 5 biallelic 
markers are located in intergenic region and are in linkage desiquilibirum with the HKLP gene. 

The primers for amplification or sequencing reaction of a polynucleotide comprising a 
biallelic marker of the invention may be designed from the disclosed sequences for any method 
known in the art. A preferred set of primers are fashioned such that the 3 r end of the contiguous 

20 span of identity with a sequence selected from the group consisting of SEQ ID Nos I -3 and 5-8 or a 
sequence complementaiy thereto or a variant thereof is present at the 3' end of the primer. Such a 
configuration allows the 3' end of the primer to hybridize to a selected nucleic acid sequence and 
dramatically increases the efficiency of the primer for amplification or sequencing reactions. Allele 
specific primers may be designed such that a polymorphic base of a biallelic marker is at the 3* end 

25 of the contiguous span and the contiguous span is present at the 3' end of the primer. Such allele 
specific primers tend to selectively prime an amplification or sequencing reaction so long as they are 
used with a nucleic acid sample that contains one of the two alleles present at a biallelic marker. 
The 3' end of the primer of the invention may be located within or at least 2, 4, 6, 8, 10, 12, 15, 1 8, 
20, 25, 50, 100, 250, 500, or 1000 nucleotides upstream of a HKLP-related biallelic marker in said 

30 sequence or at any other location which is appropriate for their intended use in sequencing, 
amplification or the location of novel sequences or markers. Thus, another set of preferred 
amplification primers comprise an isolated polynucleotide consisting essentially of a contiguous 
span of 8 to 50 nucleotides in a sequence selected from the group consisting of SEQ ID Nos 1-3 and 
5-8 or a sequence complementary thereto or a variant thereof, wherein the 3' end of said contiguous 

35 span is located at the 3'end of said polynucleotide, and wherein the 3 'end of said polynucleotide is 
located upstream of a /fAZP-related biallelic marker in said sequence. Preferably, those 
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amplification primers comprise a sequence selected from the group consisting of the sequences B 1 
to B25 and CI to C25. Primers with their 3' ends located 1 nucleotide upstream of a flAZP-related 
biallelic marker have a special utility in microsequencing assays. Preferred microsequencing 
primers are described in Table 4. Optionally, microsequencing primers are selected from the group 
5 consisting of the nucleotide sequences Dl to D30 and El to E30. 

The probes of the present invention may be designed from the disclosed sequences for any 
method known in the art, particularly methods which allow for testing if a marker disclosed herein is 
present. A preferred set of probes may be designed for use in the hybridization assays of the 
invention in any manner known in the art such that they selectively bind to one allele of a biallelic 

10 marker, but not the other allele under any particular set of assay conditions. Preferred hybridization 
probes comprise the polymorphic base of either allele 1 or allele 2 of the specific biallelic marker. 
Optionally, said biallelic marker may be within 6, 5, 4, 3, 2, or 1 nucleotides of the center of the 
hybridization probe or at the center of said probe. 

It should be noted that the polynucleotides of the present invention are not limited to having 

1 5 the exact flanking sequences surrounding the polymorphic bases which are enumerated in Sequence 
Listing. Rather, it will be appreciated that the flanking sequences surrounding the biallelic markers 
may be lengthened or shortened to any extent compatible with their intended use and the present 
invention specifically contemplates such sequences. The flanking regions outside of the contiguous 
span need not be homologous to native flanking sequences which actually occur in human subjects. 

20 The addition of any nucleotide sequence which is compatible with the nucleotides intended use is 
specifically contemplated. 

Primers and probes may be labeled or immobilized on a solid support as described in 
"Oligonucleotide probes and primers". 

The polynucleotides of the invention which are attached to a solid support encompass 

25 polynucleotides with any further limitation described in this disclosure, or those following, specified 
alone or in any combination: Optionally, said polynucleotides may be specified as attached 
individually or in groups of at least 2, 5, 8, 10, 12, 15, 20, or 25 distinct polynucleotides of the 
invention to a single solid support Optionally, polynucleotides other than those of the invention 
may attached to the same solid support as polynucleotides of the invention. Optionally, when 

30 multiple polynucleotides are attached to a solid support they may be attached at random locations, or 
in an ordered array. Optionally, said ordered array may be addressable. 

The present invention also encompasses diagnostic kits comprising one or more 
polynucleotides of the invention with a portion or all of the necessary reagents and instructions for 
genotyping a test subject by determining the identity of a nucleotide at a HKLP-nhted biallelic 

35 marker. The polynucleotides of a kit may optionally be attached to a solid support, or be part of an 
array or addressable array of polynucleotides. The kit may provide for the determination of the 
identity of the nucleotide at a marker position by any method known in the art including, but not 
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limited to, a sequencing assay method, a microsequencing assay method, a hybridization assay 
method, or an enzyme-based mismatch detection method. 

Methods F rDeNavn Identification Of Biallelic Markers 
Any of a variety of methods can be used to screen a genomic fragment for single nucleotide 
5 polymorphisms such as differential hybridization with oligonucleotide probes, detection of changes 
in the mobility measured by gel electrophoresis or direct sequencing of the amplified nucleic acid. 
A preferred method for identifying biallelic markers involves comparative sequencing of genomic 
DNA fragments from an appropriate number of unrelated individuals. 

In a first embodiment, DNA samples from unrelated individuals are pooled together, 
10 following which the genomic DNA of interest is amplified and sequenced. The nucleotide ' 

sequences thus obtained are then analyzed to identify significant polymorphisms. One of the major 
advantages of this method resides in the fact that the pooling of the DNA samples substantially 
reduces the number of DNA amplification reactions and sequencing reactions, which must be carried 
out. Moreover, this method is sufficiently sensitive so that a biallelic marker obtained thereby 
15 usually demonstrates a sufficient frequency of its less common allele to be useful in conducting 
association studies. 

In a second embodiment, the DNA samples are not pooled and are therefore amplified and 
sequenced individually. This method is usually preferred when biallelic markers need to be 
identified in order to perform association studies within candidate genes. Preferably, highly relevant 

20 gene regions such as promoter regions or exon regions may be screened for biallelic markers. A 
biallelic marker obtained using this method may show a lower degree of informativeness for 
conducting association studies, e.g. if the frequency of its less frequent allele may be less than about 
10%. Such a biallelic marker will, however, be sufficiently informative to conduct association 
studies and it will further be appreciated that including less informative biallelic markers in the 

25 genetic analysis studies of the present invention, may allow in some cases the direct identification of 
causal mutations, which may, depending on their penetrance, be rare mutations. 

The following is a description of the various parameters of a preferred method used by the 
inventors for the identification of the biallelic markers of the present invention. 

Genomic DNA Samples 

30 The genomic DNA samples from which the biallelic markers of the present invention are 

generated are preferably obtained from unrelated individuals corresponding to a heterogeneous 
population of known ethnic background. The number of individuals from whom DNA samples are 
obtained can vary substantially, preferably from about 10 to about 1000, preferably from about 50 to 
about 200 individuals. It is usually preferred to collect DNA samples from at least about 100 

35 individuals in order to have sufficient polymorphic diversity in a given population to identify as 
many markers as possible and to generate statistically significant results. 
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As for the source of the genomic DNA to be subjected to analysis, any test sample can be 
foreseen without any particular limitation. These test samples include biological samples, which can 
be tested by the methods of the present invention described herein, and include human and animal 
body fluids such as whole blood, serum, plasma, cerebrospinal fluid, urine, lymph fluids, and 
5 various external secretions of the respiratory, intestinal and genitourinary tracts, tears, saliva, milk, 
white blood cells, myelomas and the like; biological fluids such as cell culture supernatants; fixed 
tissue specimens including tumor and non-tumor tissue and lymph node tissues; bone marrow 
aspirates and fixed cell specimens. The preferred source of genomic DNA used in the present 
invention is from peripheral venous blood of each donor. Techniques to prepare genomic DNA 
10 from biological samples are well known to the skilled technician. Details of a preferred embodiment 
are provided in Example 1 . The person skilled in the art can choose to amplify pooled or unpooled 
DNA samples. 

DNA Amplification 

The identification of biallelic markers in a sample of genomic DNA may be facilitated 

1 5 through the use of DNA amplification methods. DNA samples can be pooled or unpooled for the 
amplification step. DNA amplification techniques are well known to those skilled in the art. 
Various methods to amplify DNA fragments carrying biallelic markers are further described 
hereinbefore in "Amplification of the HKLP gene". The PCR technology is the preferred 
amplification technique used to identify new biallelic markers. A typical example of a PCR reaction 

20 suitable for the purposes of the present invention is provided in Example 2. 

In a first embodiment of the present invention, biallelic markers are identified using genomic 
sequence information generated by the inventors. Sequenced genomic DNA fragments are used to 
design primers for the amplification of 500 bp fragments. These 500 bp fragments are amplified 
from genomic DNA and are scanned for biallelic markers. Primers may be designed using the OSP 

25 software (Hillier L. and Green P., 1991). All primers may contain, upstream of the specific target 
bases, a common oligonucleotide tail that serves as a sequencing primer. Those skilled in the art are 
familiar with primer extensions, which can be used for these purposes. 

Preferred primers, useful for the amplification of genomic sequences encoding the candidate 
genes, focus on promoters, exons and splice sites of the genes. A biallelic marker presents a higher 

30 probability to be an eventual causal mutation if it is located in these functional regions of the gene. 
Preferred amplification primers of the invention include the nucleotide sequences Bl to B25 and CI 
to C25, detailed further in Example 2, Table 1 . 

Sequencing Of Amplified Genomic DNA And Identification Of Single Nucleotide 
Polymorphisms 

35 The amplification products generated as described above, are then sequenced using any 

method known and available to the skilled technician. Methods for sequencing DNA using either 
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the dideoxy-mediated method (Sanger method) or the Maxam-Gilbert method are widely known to 
those of ordinary skill in the art. Such methods are for example disclosed in Sambrook et al.(1989). 
Alternative approaches include hybridization to high-density DNA probe arrays as described in Chee 
etal.(1996). 

5 Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 

reactions using a dye-primer cycle sequencing protocol. The products of the sequencing reactions 
are run on sequencing gels and the sequences are determined using gel image analysis. The 
polymorphism search is based on the presence of superimposed peaks in the electrophoresis pattern 
resulting from different bases occurring at the same position. Because each dideoxy terminator is 

10 labeled with a different fluorescent molecule, the two peaks corresponding to a biallelic site present 
distinct colors corresponding to two different nucleotides at the same position on the sequence. 
However, the presence of two peaks can be an artifact due to background noise. To exclude such an 
artifact, the two DNA strands are sequenced and a comparison between the peaks is carried out. In 
order to be registered as a polymorphic sequence, the polymorphism has to be detected on both 

1 5 strands. 

The above procedure permits those amplification products, which contain biallelic markers 
to be identified. The detection limit for the frequency of biallelic polymorphisms detected by 
sequencing pools of 100 individuals is approximately 0.1 for the minor allele, as verified by 
sequencing pools of known allelic frequencies. However, more than 90% of the biallelic 

20 polymorphisms detected by the pooling method have a frequency for the minor allele higher than 
0.25. Therefore, the biallelic markers selected by this method have a frequency of at least 0.1 for the 
minor allele and less than 0.9 for the major allele. Preferably at least 0.2 for the minor allele and less 
than 0.8 for the major allele, more preferably at least 0.3 for the minor allele and less than 0.7 for the 
major allele, thus a heterozygosity rate higher than 0.1 8, preferably higher than 0.32, more 

25 preferably higher than 0.42. 

In another embodiment, biallelic markers are detected by sequencing individual DNA 
samples, the frequency of the minor allele of such a biallelic marker may be less than 0.1 . 

Validation Of The Biallelic Markers Of The Present Invention 

The polymorphisms are evaluated for their usefulness as genetic markers by validating that 
30 both alleles are present in a population. Validation of the biallelic markers is accomplished by 
genotyping a group of individuals by a method of the invention and demonstrating that both alleles 
are present. Microsequencing is a preferred method of genotyping alleles. The validation by 
genotyping step may be performed on individual samples derived from each individual in the group 
or by genotyping a pooled sample derived from more than one individual. The group can be as 
35 small as one individual if that individual is heterozygous for the allele in question. Preferably the 
group contains at least three individuals, more preferably the group contains five or six individuals, 
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so that a single validation test will be more likely to result in the validation of more of the biallelic 
markers that are being tested. It should be noted, however, that when the validation test is 
performed on a small group it may result in a false negative result if as a result of sampling error 
none of the individuals tested carries one of the two alleles. Thus, the validation process is less 
5 useful in demonstrating that a particular initial result is an artifact, than it is at demonstrating that 
there is a bona fide biallelic marker at a particular position in a sequence. All of the genotyping, 
haplotyping, association, and interaction study methods of the invention may optionally be 
performed solely with validated biallelic markers. 

Evaluation Of The Frequency Of The Biallelic Markers Of The Present Invention 

1 0 The validated biallelic markers are further evaluated for their usefulness as genetic markers 

by determining the frequency of the least common allele at the biallelic marker site. The higher the 
frequency of the less common allele the greater the usefulness of the biallelic marker is association 
and interaction studies. The determination of the least common allele is accomplished by 
genotyping a group of individuals by a method of the invention and demonstrating that both alleles 

1 5 are present. This determination of frequency by genotyping step may be performed on individual 
samples derived from each individual in the group or by genotyping a pooled sample derived from 
more than one individual. The group must be large enough to be representative of the population as 
a whole. Preferably the group contains at least 20 individuals, more preferably the group contains at 
least 50 individuals, most preferably the group contains at least 100 individuals. Of course the larger 

20 the group the greater the accuracy of the frequency determination because of reduced sampling error. 
A biallelic marker wherein the frequency of the less common allele is 30% or more is termed a "high 
quality biallelic marker" All of the genotyping, haplotyping, association, and interaction study 
methods of the invention may optionally be performed solely with high quality biallelic markers. 

The invention also relates to methods of estimating the frequency of an allele in a population 

25 comprising: a) genotyping individuals from said population for said biallelic marker according to the - 
method of the present invention; b) determining the proportional representation of said biallelic 
marker in said population. In addition, the methods of estimating the frequency of an allele in a 
population of the invention encompass methods with any further limitation described in this 
disclosure, or those following, specified alone or in any combination; optionally, wherein said 

30 ifKLP-related biallelic marker is selected from the group consisting of Al to A32, and the 
complements thereof, or optionally the biallelic markers in linkage disequilibrium therewith; 
optionally, wherein said 7/AZP-related biallelic marker is selected from the group consisting of Al 
to A22 and A25 to A32, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, wherein said /JKLP-related biallelic marker is selected from 

35 the group consisting of A23 and A24; optionally, determining the frequency of a biallelic marker 
allele in a population may be accomplished by determining the identity of the nucleotides for both 
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copies of said biallelic marker present in the genome of each individual in said population and 
calculating the proportional representation of said nucleotide at said /TAZP-related biallelic marker 
for the population; optionally, determining the proportional representation may be accomplished by 
performing a genotyping method of the invention on a pooled biological sample derived from a 
5 representative number of individuals, or each individual, in said population, and calculating the 
proportional amount of said nucleotide compared with the total. 

Methods For Genotyping An Individual For Biallelic Markers 
Methods are provided to genotype a biological sample for one or more biallelic markers of 
the present invention, all of which may be performed in vitro. Such methods of genotyping 
1 0 comprise determining the identity of a nucleotide at a HKLP biallelic marker site by any method 
known in the art. These methods find use in genotyping case-control populations in association 
studies as well as individuals in the context of detection of alleles of biallelic markers which are 
known to be associated with a given trait, in which case both copies of the biallelic marker present in 
individual's genome are determined so that an individual may be classified as homozygous or 
1 5 heterozygous for a particular allele. 

These genotyping methods can be performed on nucleic acid samples derived from a single 
individual or pooled DNA samples. 

Genotyping can be performed using similar methods as those described above for the 
identification of the biallelic markers, or using other genotyping methods such as those further 
20 described below. In preferred embodiments, the comparison of sequences of amplified genomic 
fragments from different individuals is used to identify new biallelic markers whereas 
microsequencing is used for genotyping known biallelic markers in diagnostic and association study 
applications. 

In one embodiment the invention encompasses methods of genotyping comprising 
25 determining the identity of a nucleotide at a /ZKLP-related biallelic marker or the complement 
thereof in a biological sample; Optionally, said HKLP-velated biallelic marker is selected from the 
group consisting of A 1 to A32, and the complements thereof, or optionally the biallelic markers in 
linkage disequilibrium therewith; optionally, said i/AZP-related biallelic marker is selected from the 
group consisting of A 1 to A 1 7, and A20 to A22, and the complements thereof, or optionally the 
30 biallelic markers in linkage disequilibrium therewith; optionally, said /ZKLP-related biallelic marker 
is selected from the group consisting of A23 and A24, and the complements thereof, or optionally 
the biallelic markers in linkage disequilibrium therewith; optionally, wherein said biological sample 
is derived from a single subject; optionally, wherein the identity of the nucleotides at said biallelic 
marker is determined for both copies of said biallelic marker present in said individual's genome; 
35 optionally, wherein said biological sample is derived from multiple subjects; optionally, further 
comprising amplifying a portion of said sequence comprising the biallelic marker prior to said 
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determining step; optionally, wherein said amplifying is performed by PCR; optionally, wherein said 
determining is performed by a hybridization assay, a sequencing assay, a microsequencing assay, or 
an enzyme-based mismatch detection assay. 

Source of DNA for genotyping 

5 Any source of nucleic acids, in purified or non-purified form, can be utilized as the starting 

nucleic acid, provided it contains or is suspected of containing the specific nucleic acid sequence 
desired. DNA or RNA may be extracted from cells, tissues, body fluids and the like as described 
above. While nucleic acids for use in the genotyping methods of the invention can be derived from 
any mammalian source, the test subjects and individuals from which nucleic acid samples are taken 

1 0 are generally understood to be human. 

Amplification Of DNA Fragments Comprising Biallelic Markers 

Methods and polynucleotides are provided to amplify a segment of nucleotides comprising 
one or more biallelic marker of the present invention. It will be appreciated that amplification of 
DNA fragments comprising biallelic markers may be used in various methods and for various 

15 purposes and is not restricted to genotyping. Nevertheless, many genotyping methods, although not 
all, require the previous amplification of the DNA region carrying the biallelic marker of interest. 
Such methods specifically increase the concentration or total number of sequences that span the 
biallelic marker or include that site and sequences located either distal or proximal to it. Diagnostic 
assays may also rely on amplification of DNA segments carrying a biallelic marker of the present 

20 invention. Amplification of DNA may be achieved by any method known in the art. Amplification 
techniques are described above in the section entitled, "Amplification of the HKLP gene". 

Some of these amplification methods are particularly suited for the detection of single 
nucleotide polymorphisms and allow the simultaneous amplification of a target sequence and the 
identification of the polymorphic nucleotide as it is further described below. 

25 The identification of biallelic markers as described above allows the design of appropriate 

oligonucleotides, which can be used as primers to amplify DNA fragments comprising the biallelic 
markers of the present invention. Amplification can be performed using the primers initially used to 
discover new biallelic markers which are described herein or any set of primers allowing the 
amplification of a DNA fragment comprising a biallelic marker of the present invention. 

30 In some embodiments the present invention provides primers for amplifying a DNA 

fragment containing one or more biallelic markers of the present invention. Preferred amplification 
primers are listed in Example 2. It will be appreciated that the primers listed are merely exemplary 
and that any other set of primers which produce amplification products containing one or more 
biallelic markers of the present invention. 

35 The spacing of the primers determines the length of the segment to be amplified. In the 

context of the present inv ntion, amplified segments carrying biallelic markers can range in size 
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from at least about 25 bp to 35 kbp. Amplification fragments from 25-3000 bp are typical, 
fragments from 50-1000 bp are preferred and fragments from 100-600 bp are highly preferred. It 
will be appreciated that amplification primers for the biallelic markers may be any sequence which 
allow the specific amplification of any DNA fragment carrying the markers. Amplification primers 
5 may be labeled or immobilized on a solid support as described in "Oligonucleotide probes and 
primers**. 

Methods of Genotyping DNA samples for Biallelic Markers 

Any method known in the art can be used to identify the nucleotide present at a biallelic 
marker site. Since the biallelic marker allele to be detected has been identified and specified in the 

10 present invention, detection will prove simple for one of ordinary skill in the art by employing any 
of a number of techniques. Many genotyping methods require the previous amplification of the 
DNA region carrying the biallelic marker of interest. While the amplification of target or signal is 
often preferred at present, ultrasensitive detection methods which do not require amplification are 
also encompassed by the present genotyping methods. Methods well-known to those skilled in the 

1 5 art that can be used to detect biallelic polymorphisms include methods such as, conventional dot blot 
analyzes, single strand conformational polymorphism analysis (SSCP) described by Orita et 
al.(1989), denaturing gradient gel electrophoresis (DGGE), heteroduplex analysis, mismatch 
cleavage detection, and other conventional techniques as described in Sheffield et al.(1991), White 
et al.(1992) s Grompe et al.(1989 and 1993). Another method for determining the identity of the 

20 nucleotide present at a particular polymorphic site employs a specialized exonuclease-resistant 
nucleotide derivative as described in US patent 4,656,127. 

Preferred methods involve directly determining the identity of the nucleotide present at a 
biallelic marker site by sequencing assay, enzyme-based mismatch detection assay, or hybridization 
assay. The following is a description of some preferred methods. A highly preferred method is the 

25 microsequencing technique. The term "sequencing" is used herein to refer to polymerase extension 
of duplex primer/template complexes and includes both traditional sequencing and microsequencing. 
1) Sequencing Assays 

The nucleotide present at a polymorphic site can be determined by sequencing methods. In 
a preferred embodiment, DNA samples are subjected to PCR amplification before sequencing as 
30 described above. DNA sequencing methods are described in "Sequencing Of Amplified Genomic 
DNA And Identification Of Single Nucleotide Polymorphisms". 

Preferably, the amplified DNA is subjected to automated dideoxy terminator sequencing 
reactions using a dye-primer cycle sequencing protocol. Sequence analysis allows the identification 
of the base present at the biallelic marker site. 
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2) Microsequencing Assays 

In microsequencing methods, the nucleotide at a polymorphic site in a target DNA is 
detected by a single nucleotide primer extension reaction. This method involves appropriate 
microsequencing primers which, hybridize just upstream of the polymorphic base of interest in the 
5 target nucleic acid. A polymerase is used to specifically extend the 3' end of the primer with one 
single ddNTP (chain terminator) complementary to the nucleotide at the polymorphic site. Next the 
identity of the incorporated nucleotide is determined in any suitable way. 

Typically, microsequencing reactions are carried out using fluorescent ddNTPs and the 
extended microsequencing primers are analyzed by electrophoresis on ABI 377 sequencing 

1 0 machines to determine the identity of the incorporated nucleotide as described in EP 4 1 2 883 . 
Alternatively capillary electrophoresis can be used in order to process a higher number of assays 
simultaneously. An example of a typical microsequencing procedure that can be used in the context 
of the present invention is provided in Example 4. 

Different approaches can be used for the labeling and detection of ddNTPs. A homogeneous 

1 5 phase detection method based on fluorescence resonance energy transfer has been described by Chen 
and Kwok (1997) and Chen et al.(1997). In this method, amplified genomic DNA fragments 
containing polymorphic sites are incubated with a 5'-fluorescein-labeled primer in the presence of 
allelic dye-labeled dideoxyribonucleoside triphosphates and a modified Taq polymerase. The dye- 
labeled primer is extended one base by the dye-terminator specific for the allele present on the 

20 template. At the end of the genotyping reaction, the fluorescence intensities of the two dyes in the 
reaction mixture are analyzed directly without separation or purification. All these steps can be 
performed in the same tube and the fluorescence changes can be monitored in real time. 
Alternatively, the extended primer may be analyzed by MALDI-TOF Mass Spectrometry. The base 
at the polymorphic site is identified by the mass added onto the microsequencing primer (see Haff 

25 and Smirnov, 1997). 

Microsequencing may be achieved by the established microsequencing method or by 
developments or derivatives thereof. Alternative methods include several solid-phase 
microsequencing techniques. The basic microsequencing protocol is the same as described 
previously, except that the method is conducted as a heterogeneous phase assay, in which the primer 

30 or the target molecule is immobilized or captured onto a solid support. To simplify the primer 
separation and the terminal nucleotide addition analysis, oligonucleotides are attached to solid 
supports or are modified in such ways that permit affinity separation as well as polymerase 
extension. The 5' ends and internal nucleotides of synthetic oligonucleotides can be modified in a 
number of different ways to permit different affinity separation approaches, e.g., biotinylation. If a 

35 single affinity group is used on the oligonucleotides, the oligonucleotides can be separated from the 
incorporated terminator regent. This eliminates the need of physical or size separation. More than 
one oligonucleotide can be separated from the terminator reagent and analyzed simultaneously if 
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more than one affinity group is used. This permits the analysis of several nucleic acid species or 
more nucleic acid sequence information per extension reaction. The affinity group need not be on 
the priming oligonucleotide but could alternatively be present on the template. For example, 
immobilization can be carried out via an interaction between biotinylated DNA and streptavidin- 

5 coated microtitration wells or avidin-coated polystyrene particles. In the same manner, 

oligonucleotides or templates may be attached to a solid support in a high-density format. In such 
solid phase microsequencing reactions, incorporated ddNTPs can be radiolabeled (SyvSnen, 1994) 
or linked to fluorescein (Livak and Hainer, 1994). The detection of radiolabeled ddNTPs can be 
achieved through scintillation-based techniques. The detection of fluorescein-linked ddNTPs can be 

10 based on the binding of antifluorescein antibody conjugated with alkaline phosphatase, followed by 
incubation with a chromogenic substrate (such as /Miitrophenyl phosphate). Other possible reporter- 
detection pairs include: ddNTP linked to dinitrophenyl (DNP) and anti-DNP alkaline phosphatase 
conjugate (Harju et al., 1993) or biotinylated ddNTP and horseradish peroxidase-conjugated 
streptavidin with o-phenylenediamine as a substrate (WO 92/15712). As yet another alternative 

1 5 solid-phase microsequencing procedure, Nyren et al.(l 993) described a method relying on the 
detection of DNA polymerase activity by an enzymatic luminometric inorganic pyrophosphate 
detection assay (ELIDA). 

Pastinen et al.(1997) describe a method for multiplex detection of single nucleotide 
polymorphism in which the solid phase minisequencing principle is applied to an oligonucleotide 

20 array format. High-density arrays of DNA probes attached to a solid support (DNA chips) are 
further described below. 

In one aspect the present invention provides polynucleotides and methods to genotype one 
or more biallelic markers of the present invention by performing a microsequencing assay. Preferred 
microsequencing primers include the nucleotide sequences Dl to D30 and El to E30. It will be 

25 appreciated that the microsequencing primers listed in Example 4 are merely exemplary and that, 
any primer having a 3' end immediately adjacent to the polymorphic nucleotide may be used. 
Similarly, it will be appreciated that microsequencing analysis may be performed for any biallelic 
marker or any combination of biallelic markers of the present invention. One aspect of the present 
invention is a solid support which includes one or more microsequencing primers listed in Example 

30 4, or fragments comprising at least 8, 12, 1 5, 20, 25, 30, 40, or 50 consecutive nucleotides thereof 
and having a 3* terminus immediately upstream of the corresponding biallelic marker, for 
determining the identity of a nucleotide at a biallelic marker site. 

3) Mismatch detection assays based on polymerases and ligases 

In one aspect the present invention provides polynucleotides and methods to determine the 
35 allele of one or more biallelic markers of the present invention in a biological sample, by mismatch 
detection assays based on polymerases and/or ligases. These assays are based on the specificity of 
polymerases and ligases. Polymerization reactions places particularly stringent requirements on 
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correct base pairing of the 3' end of the amplification primer and the joining of two oligonucleotides 
hybridized to a target DNA sequence is quite sensitive to mismatches close to the ligation site, 
especially at the 3* end. Methods, primers and various parameters to amplify DNA fragments 
comprising biallelic markers of the present invention are further described above in "Amplification 
5 Of DNA Fragments Comprising Biallelic Markers" 

Allele Specific Amplification Primers 

Discrimination between the two alleles of a biallelic marker can also be achieved by allele 
specific amplification, a selective strategy, whereby one of the alleles is amplified without 
amplification of the other allele. This is accomplished by placing the polymorphic base at the 3' end 
10 of one of the amplification primers. Because the extension forms from the 3'end of the primer, a 
mismatch at or near this position has an inhibitory effect on amplification. Therefore, under 
appropriate amplification conditions, these primers only direct amplification on their complementary 
allele. Determining the precise location of the mismatch and the corresponding assay conditions are 
well with the ordinary skill in the art. 

1 5 Ligation/Amplification Based Methods . 

The "Oligonucleotide Ligation Assay" (OLA) uses two oligonucleotides which are designed 
to be capable of hybridizing to abutting sequences of a single strand of a target molecules. One of 
the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise 
complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that 

20 their termini abut, and create a ligation substrate that can be captured and detected. OLA is capable 
of detecting single nucleotide polymorphisms and may be advantageously combined with PCR as 
described by Nickerson et al.(1990). In this method, PCR is used to achieve the exponential 
amplification of target DNA, which is then detected using OLA. 

Other amplification methods which are particularly suited for the detection of single 

25 nucleotide polymorphism include LCR (ligase chain reaction), Gap LCR (GLCR) which are 
described above in "Amplification of the HKLP gene". LCR uses two pairs of probes to 
exponentially amplify a specific target. The sequences of each pair of oligonucleotides, is selected 
to permit the pair to hybridize to abutting sequences of the same strand of the target. Such 
hybridization forms a substrate for a template-dependant ligase. In accordance with the present 

30 invention, LCR can be performed with oligonucleotides having the proximal and distal sequences of 
the same strand of a biallelic marker site. In one embodiment, either oligonucleotide will be 
designed to include the biallelic marker site. In such an embodiment, the reaction conditions are 
selected such that the oligonucleotides can be ligated together only if the target molecule either 
contains or lacks the specific nucleotide that is complementary to the biallelic marker on the 

35 oligonucleotide. In an alternative embodiment, the oligonucleotides will not include the biallelic 
marker, such that when they hybridize to the target molecule, a "gap" is created as described in WO 
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90/01069. This gap is then "filled" with complementary dNTPs (as mediated by DNA polymerase), 
or by an additional pair of oligonucleotides. Thus at the end of each cycle, each single strand has a 
complement capable of serving as a target during the next cycle and exponential allele-specific 
amplification of die desired sequence is obtained. 
5 Ligase/Polymerase-mediated Genetic Bit Analysis™ is another method for determining the 

identity of a nucleotide at a preselected site in a nucleic acid molecule (WO 95/21271). This method 
involves the incorporation of a nucleoside triphosphate that is complementary to the nucleotide 
present at the preselected site onto the terminus of a primer molecule, and their subsequent ligation 
to a second oligonucleotide. The reaction is monitored by detecting a specific label attached to the 
1 0 reaction's solid phase or by detection in solution. 
4) Hybridization Assay Methods 

A preferred method of determining the identity of the nucleotide present at a biallelic marker 
site involves nucleic acid hybridization. The hybridization probes, which can be conveniently used 
in such reactions, preferably include the probes defined herein. Any hybridization assay may be 

1 5 used including Southern hybridization, Northern hybridization, dot blot hybridization and solid- 
phase hybridization (see Sambrook et al., 1989). 

Hybridization refers to the formation of a duplex structure by two single stranded nucleic 
acids due to complementary base pairing. Hybridization can occur between exactly complementary 
nucleic acid strands or between nucleic acid strands that contain minor regions of mismatch. 

20 Specific probes can be designed that hybridize to one form of a biallel ic marker and not to the other 
and therefore are able to discriminate between different allelic forms. Allele-specific probes are 
often used in pairs, one member of a pair showing perfect match to a target sequence containing the 
original allele and the other showing a perfect match to the target sequence containing the alternative 
allele. Hybridization conditions should be sufficiently stringent that there is a significant difference 

25 in hybridization intensity between alleles, and preferably an essentially binary response, whereby a 
probe hybridizes to only one of the alleles. Stringent, sequence specific hybridization conditions, 
under which a probe will hybridize only to the exactly complementary target sequence are well 
known in the art (Sambrook et al., 1989). Stringent conditions are sequence dependent and will be 
different in different circumstances. Generally, stringent conditions are selected to be about 5°C 

30 lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and 
pH. Although such hybridizations can be performed in solution, it is preferred to employ a solid- 
phase hybridization assay. The target DNA comprising a biallelic marker of the present invention 
may be amplified prior to the hybridization reaction. The presence of a specific allele in the sample 
is determined by detecting the presence or the absence of stable hybrid duplexes formed between the 

35 probe and the target DNA. The detection of hybrid duplexes can be carried out by a number of 
methods. Various detection assay formats are well known which utilize detectable labels bound to 
either the target or the probe to enable detection of the hybrid duplexes. Typically, hybridization 
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duplexes are separated from unhybridized nucleic acids and the labels bound to the duplexes are then 
detected. Those skilled in the art will recognize that wash steps may be employed to wash away 
excess target DNA or probe as well as unbound conjugate. Further, standard heterogeneous assay 
formats are suitable for detecting the hybrids using the labels present on the primers and probes. 
5 Two recently developed assays allow hybridization-based allele discrimination with no need 

for separations or washes (see Landegren U. et al., 1998). The TaqMan assay takes advantage of 
the 5' nuclease activity of Taq DNA polymerase to digest a DNA probe annealed specifically to the 
accumulating amplification product. TaqMan probes are labeled with a donor-acceptor dye pair that 
interacts via fluorescence energy transfer. Cleavage of the TaqMan probe by the advancing 

1 0 polymerase during amplification dissociates the donor dye from the quenching acceptor dye, greatly 
increasing the donor fluorescence. AH reagents necessary to detect two allelic variants can be 
assembled at the beginning of the reaction and the results are monitored in real time (see Livak et al., 
1995). In an alternative homogeneous hybridization based procedure, molecular beacons are used 
for allele discriminations. Molecular beacons are hairpin-shaped oligonucleotide probes that report 

1 5 the presence of specific nucleic acids in homogeneous solutions. When they bind to their targets 
they undergo a conformational reorganization that restores the fluorescence of an internally 
quenched fluorophore (Tyagi et al., 1998). 

' The polynucleotides provided herein can be used to produce probes which can be used in 

hybridization assays for the detection of biallelic marker alleles in biological samples. These probes 

20 are characterized in that they preferably comprise between 8 and 50 nucleotides, and in that they are 
sufficiently complementary to a sequence comprising a biallelic marker of the present invention to 
hybridize thereto and preferably sufficiently specific to be able to discriminate the targeted sequence 
for only one nucleotide variation. A particularly preferred probe is 25 nucleotides in length. 
Preferably the biallelic marker is within 4 nucleotides of the center of the polynucleotide probe. In 

25 particularly preferred probes, the biallelic marker is at the center of said polynucleotide. Preferred 
probes comprise a nucleotide sequence selected from the group consisting of amplicons listed in 
Table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising 
at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. In preferred embodiments the 

30 polymorphic base is within 5, 4, 3, 2, 1, nucleotides of the center of the said polynucleotide, more 
preferably at the center of said polynucleotide. 

Preferably the probes of the present invention are labeled or immobilized on a solid support. 
Labels and solid supports are further described in "Oligonucleotide Probes and Primers". The 
probes can be non-extendable as described in "Oligonucleotide Probes and Primers". 

35 By assaying the hybridization to an allele specific probe, one can detect the presence or 

absence of a biallelic marker allele in a given sample. High-Throughput parallel hybridizations in 
array format are specifically encompassed within "hybridization assays" and are described below. 
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5) Hybridization To Addressable Arrays Of Oligonucleotides 

Hybridization assays based on oligonucleotide arrays rely on the differences in hybridization 
stability of short oligonucleotides to perfectly matched and mismatched target sequence variants. 
Efficient access to polymorphism information is obtained through a basic structure comprising high- 
5 density arrays of oligonucleotide probes attached to a solid support (e.g., the chip) at selected 
positions. Each DNA chip can contain thousands to millions of individual synthetic DNA probes 
arranged in a grid-like pattern and miniaturized to the size of a dime. 

The chip technology has already been applied with success in numerous cases. For example, 
the screening of mutations has been undertaken in the BRCA1 gene, in S. cerevisiae mutant strains, 
1 0 and in the protease gene of HIV-1 virus (Hacia et a!., 1 996; Shoemaker et al., 1 996; Koza) et a!., 
1996). Chips of various formats for use in detecting biallelic polymorphisms can be produced on a 
customized basis by Aflymetrix (GeneChip™), Hyseq (HyChip and HyGnostics), and Protogene 
Laboratories. 

In general, these methods employ arrays of oligonucleotide probes that are complementary 

1 5 to target nucleic acid sequence segments from an individual which, target sequences include a 
polymorphic marker. EP 785280 describes a tiling strategy for the detection of single nucleotide 
polymorphisms. Briefly, arrays may generally be "tiled" for a large number of specific 
polymorphisms. By "tiling" is generally meant the synthesis of a defined set of oligonucleotide 
probes which is made up of a sequence complementary to the target sequence of interest, as well as 

20 preselected variations of that sequence, e.g., substitution of one or more given positions with one or 
more members of the basis set of monomers, i.e. nucleotides. Tiling strategies are further described 
in PCT application No. WO 95/1 1995. In a particular aspect, arrays are tiled for a number of 
specific, identified biallelic marker sequences. In particular, the array is tiled to include a number of 
detection blocks, each detection block being specific for a specific biallelic marker or a set of 

25 biallelic markers. For example, a detection block may be tiled to include a number of probes, which 
span the sequence segment that includes a specific polymorphism. To ensure probes that are 
complementary to each allele, the probes are synthesized in pairs differing at the biallelic marker. In 
addition to the probes differing at the polymorphic base, monosubstituted probes are also generally 
tiled within the detection block. These monosubstituted probes have bases at and up to a certain 

30 number of bases in either direction from the polymorphism, substituted with the remaining 

nucleotides (selected from A, T, G, C and U). Typically the probes in a tiled detection block will 
include substitutions of the sequence positions up to and including those that are 5 bases away from 
the biallelic marker. The monosubstituted probes provide internal controls for the tiled array, to 
distinguish actual hybridization from artefactual cross-hybridization. Upon completion of 

35 hybridization with the target sequence and washing of the array, the array is scanned to determine 
the position on the array to which the target sequence hybridizes. The hybridization data from the 
scanned array is then analyzed to identify which allele or alleles of the biallelic marker are present in 
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the sample. Hybridization and scanning may be carried out as described in PCT application No. WO 
92/10092 and WO 95/1 1995 and US patent No. 5,424,1 86. 

Thus, in some embodiments, the chips may comprise an array of nucleic acid sequences of 
fragments of about 1 5 nucleotides in length. In further embodiments, the chip may comprise an 
5 array including at least one of the sequences selected from the group consisting of amplicons listed 
in table 1 and the sequences complementary thereto, or a fragment thereof, said fragment comprising 
at least about 8 consecutive nucleotides, preferably 10, 15, 20, more preferably 25, 30, 40, 47, or 50 
consecutive nucleotides and containing a polymorphic base. In preferred embodiments the 
polymorphic base is within 5, 4, 3, 2, 1, nucleotides of the center of the said polynucleotide, more 
10 preferably at the center of said polynucleotide. In some embodiments, the chip may comprise an 
array of at least 2, 3, 4, 5, 6, 7, 8 or more of these polynucleotides of the invention. Solid supports 
and polynucleotides of the present invention attached to solid supports are further described in 
"oligonucleotide probes and primers". 
6) Integrated Systems 

1 5 Another technique, which may be used to analyze polymorphisms, includes multicomponent 

integrated systems, which miniaturize and compartmentalize processes such as PCR and capillary 
electrophoresis reactions in a single functional device. An example of such technique is disclosed in 
US patent 5,589,136, which describes the integration of PCR amplification and capillary 
electrophoresis in chips. 

20 Integrated systems can be envisaged mainly when microfluidic systems are used. These 

systems comprise a pattern of microchannels designed onto a glass, silicon, quartz, or plastic wafer 
included on a microchip. The movements of the samples are controlled by electric, electroosmotic 
or hydrostatic forces applied across different areas of the microchip to create functional microscopic 
valves and pumps with no moving parts. 

25 For genotyping biallelic markers, the microfluidic system may integrate nucleic acid 

amplification, microsequencing, capillary electrophoresis and a detection method such as laser- 
induced fluorescence detection. 

Methods Of Genetic Analysis Using The Biallelic Markers Of The Present Invention 
Different methods are available for the genetic analysis of complex traits (see Lander and 
30 Schork, 1994). The search for disease-susceptibility genes is conducted using two main methods: 
the linkage approach in which evidence is sought for cosegregation between a locus and a putative 
trait locus using family studies, and the association approach in which evidence is sought for a 
statistically significant association between an allele and a trait or a trait causing allele (Khoury et 
al, 1993). In general, the biallelic markers of the present invention find use in any method known in 
35 the art to demonstrate a statistically significant correlation between a genotype and a phenotype. 
The biallelic markers may be used in parametric and non-parametric linkage analysis methods. 
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Preferably, the biallelic markers of the present invention are used to identify genes associated with 
detectable traits using association studies, an approach which does not require the use of affected 
families and which permits the identification of genes associated with complex and sporadic traits. 
The genetic analysis using the biallelic markers of the present invention may be conducted 
5 on any scale. The whole set of biallelic markers of the present invention or any subset of biallelic 
markers of the present invention corresponding to the candidate gene may be used. Further, any set 
of genetic markers including a biallelic marker of the present invention may be used. A set of 
biallelic polymorphisms that could be used as genetic markers in combination with the biallelic 
markers of the present invention has been described in WO 98/20165. As mentioned above, it 
1 0 should be noted that the biallelic markers of the present invention may be included in any complete 
or partial genetic map of the human genome. These different uses are specifically contemplated in 
the present invention and claims. 

Linkage Analysis 

Linkage analysis is based upon establishing a correlation between the transmission of 
1 5 genetic markers and that of a specific trait throughout generations within a family. Thus, the aim of 
linkage analysis is to detect marker loci that show congregation with a trait of interest in pedigrees. 
Parametric Methods 

When data are available from successive generations there is the opportunity to study the 
degree of linkage between pairs of loci. Estimates of the recombination fraction enable loci to be 

20 ordered and placed onto a genetic map. With loci that are genetic markers, a genetic map can be 
established, and then the strength of linkage between markers and traits can be calculated and used 
to indicate the relative positions of markers and genes affecting those traits (Weir, 1996). The 
classical method for linkage analysis is the logarithm of odds (lod) score method (see Morton, 1955; 
Ott, 1991). Calculation of lod scores requires specification of the mode of inheritance for the 

25 disease (parametric method). Generally, the length of the candidate region identified using linkage 
analysis is between 2 and 20Mb. Once a candidate region is identified as described above, analysis 
of recombinant individuals using additional markers allows further delineation of the candidate 
region. Linkage analysis studies have generally relied on the use of a maximum of 5,000 
microsatellite markers, thus limiting the maximum theoretical attainable resolution of linkage 

30 analysis to about 600 kb on average. 

Linkage analysis has been successfully applied to map simple genetic traits that show clear 
Mendelian inheritance patterns and which have a high penetrance (i.e., the ratio between the number 
of trait positive carriers of allele a and the total number of a carriers in the population). However, 
parametric linkage analysis suffers from a variety of drawbacks. First, it is limited by its reliance on 

35 the choice of a genetic model suitable for each studied trait. Furthermore, as already mentioned, the 
resolution attainable using linkage analysis is limited, and complementary studies are required to 
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refine the analysis of die typical 2Mb to 20Mb regions initially identified through linkage analysis. 
In addition, parametric linkage analysis approaches have proven difficult when applied to complex 
genetic traits, such as those due t the combined action of multiple genes and/or environmental 
factors. It is very difficult to model these factors adequately in a lod score analysis. In such cases, 
5 too large an effort and cost are needed to recruit the adequate number of affected families required 
for applying linkage analysis to these situations, as recently discussed by Risch, N. and Merikangas, 
K.(1996). 

Non-Parametric Methods 

The advantage of the so-called non-parametric methods for linkage analysis is that they do 
1 0 not require specification of the mode of inheritance for the disease, they tend to be more useful for 
the analysis of complex traits. In non-parametric methods, one tries to prove that the inheritance 
pattern of a chromosomal region is not consistent with random Mendelian segregation by showing 
that affected relatives inherit identical copies of the region more often than expected by chance. 
Affected relatives should show excess "allele sharing" even in the presence of incomplete 
15 penetrance and polygenic inheritance. In non-parametric linkage analysis the degree of agreement at 
a marker locus in two individuals can be measured either by the number of alleles identical by state 
(IBS) or by the number of alleles identical by descent (IBD). Affected sib pair analysis is a well- 
known special case and is the simplest form of these methods. 

The biallelic markers of the present invention may be used in both parametric and non- 
20 parametric linkage analysis. Preferably biallelic markers may be used in non-parametric methods 
which allow the mapping of genes involved in complex traits. The biallelic markers of the present 
invention may be used in both IBD- and IBS- methods to map genes affecting a complex trait. In 
such studies, taking advantage of the high density of biallelic markers, several adjacent biallelic 
marker loci may be pooled to achieve the efficiency attained by multi-allelic markers (Zhao et al., 
25 1998). 

Population Association Studies 

The present invention comprises methods for identifying if the HKLP gene is associated 
with a detectable trait using the biallelic markers of the present invention. In one embodiment the 
present invention comprises methods to detect an association between a biallelic marker allele or a 

30 biallelic marker haplotype and a trait. Further, the invention comprises methods to identify a trait 
causing allele in linkage disequilibrium with any biallelic marker allele of the present invention. 

As described above, alternative approaches can be employed to perform association studies: 
genome-wide association studies, candidate region association studies and candidate gene 
association studies. In a preferred embodiment, the biallelic markers of the present invention are 

35 used to perform candidate gene association studies. The candidate gene analysis clearly provides a 
short-cut approach to the identification of genes and gene polymorphisms related to a particular trait 
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when some information concerning the biology of the trait is available. Further, the biallelic 
markers of the present invention may be incorporated in any map of genetic markers of the human 
genome in order to perform genome-wide association studies. Methods to generate a high-density 
map of biallelic markers has been described in US Provisional Patent application serial number 
5 60/082,614. The biallelic markers of the present invention may further be incorporated in any map 
of a specific candidate region of the genome (a specific chromosome or a specific chromosomal 
segment for example). 

As mentioned above, association studies may be conducted within the general population 
and are not limited to studies performed on related individuals in affected families. Association 

1 0 studies are extremely valuable as they permit the analysis of sporadic or multifactor traits. 

Moreover, association studies represent a powerful method for fine-scale mapping enabling much 
finer mapping of trait causing alleles than linkage studies. Studies based on pedigrees often only 
narrow the location of the trait causing allele. Association studies using the biallelic markers of the 
present invention can therefore be used to refine the location of a trait causing allele in a candidate 

1 5 region identified by Linkage Analysis methods. Moreover, once a chromosome segment of interest 
has been identified, the presence of a candidate gene such as a candidate gene of the present 
invention, in the region of interest can provide a shortcut to the identification of the trait causing 
allele. Biallelic markers of the present invention can be used to demonstrate that a candidate gene is 
associated with a trait. Such uses are specifically contemplated in the present invention. 

20 Determining The Frequency Of A Biallelic Marker Allele Or Of A Biallelic Marker 

Haplotype In A Population 

Association studies explore the relationships among frequencies for sets of alleles between 

loci. 

Determining The Frequency Of An Allele In A Population 

25 Allelic frequencies of the biallelic markers in a populations can be determined using one of 

the methods described above under the heading "Methods for genotyping an individual for biallelic 
markers", or any genotyping procedure suitable for this intended purpose. Genotyping pooled 
samples or individual samples can determine the frequency of a biallelic marker allele in a 
population. One way to reduce the number of genotypings required is to use pooled samples. A 

30 major obstacle in using pooled samples is in terms of accuracy and reproducibility for determining 
accurate DNA concentrations in setting up the pools. Genotyping individual samples provides 
higher sensitivity, reproducibility and accuracy and; is the preferred method used in the present 
invention. Preferably, each individual is genotyped separately and simple gene counting is applied 
to determine the frequency of an allele of a biallelic marker or of a genotype in a given population. 
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Determining Th Frequency Of A Haplotvpe In A Population 

The gametic phase of haplotypes is unknown when diploid individuals are heterozygous at 
more than one locus. Using genealogical information in families gametic phase can sometimes be 
inferred (Perlin et al., 1994). When no genealogical information is available different strategies may 
5 be used. One possibility is that the multiple-site heterozygous diploids can be eliminated from the 
analysis, keeping only the homozygotes and the single-site heterozygote individuals, but this 
approach might lead to a possible bias in the sample composition and the underestimation of low- 
frequency haplotypes. Another possibility is that single chromosomes can be studied independently, 
for example, by asymmetric PCR amplification (see Newton et al, 1989; Wu et al., 1989) or by 

1 0 isolation of single chromosome by limit dilution followed by PCR amplification (see Ruano et al., 
1990). Further, a sample may be haplotyped for sufficiently close biallelic markers by double PCR 
amplification of specific alleles (Sarkar, G. and Sommer S. S., 1991). These approaches are not 
entirely satisfying either because of their technical complexity, the additional cost they entail, their 
lack of generalization at a large scale, or the possible biases they introduce. To overcome these 

1 5 difficulties, an algorithm to infer the phase of PCR-amplified DNA genotypes introduced by Clark, 
A.G.(1990) may be used. Briefly, the principle is to start filling a preliminary list of haplotypes 
present in the sample by examining unambiguous individuals, that is, the complete homozygotes and 
the single-site heterozygotes. Then other individuals in the same sample are screened for the 
possible occurrence of previously recognized haplotypes. For each positive identification, the 

20 complementary haplotype is added to the list of recognized haplotypes, until the phase information 
for all individuals is either resolved or identified as unresolved. This method assigns a single 
haplotype to each multiheterozygous individual, whereas several haplotypes are possible when there 
are more than one heterozygous site. Alternatively, one can use methods estimating haplotype 
frequencies in a population without assigning haplotypes to each individual. Preferably, a method 

25 based on an expectation-maximization (EM) algorithm (Dempster et al., 1977) leading to maximum- 
likelihood estimates of haplotype frequencies under the assumption of Hardy- Weinberg proportions 
(random mating) is used (see Excoffier L. and Slatkin M, 1995). The EM algorithm is a generalized 
iterative maximum-likelihood approach to estimation that is useful when data are ambiguous and/or 
incomplete. The EM algorithm is used to resolve heterozygotes into haplotypes. Haplotype 

30 estimations are further described below under the heading "Statistical Methods." Any other method 
known in the art to determine or to estimate the frequency of a haplotype in a population may be 
used. 

The invention also encompasses methods of estimating the frequency of a haplotype for a set 
of biallelic markers in a population, comprising the steps of: a) genotyping at least one HKLP- 
35 related biallelic marker according to a method of the invention for each individual in said 

population; b) genotyping a second biallelic marker by determining the identity of the nucleotides at 
said second biallelic marker for both copi s of said second biallelic marker present in the genome of 
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each individual in said population; and c) applying a haplotype determination method to the 
identities of the nucleotides determined in steps a) and b) to obtain an estimate of said frequency. In 
addition, the methods of estimating the frequency of a haplotype of the invention encompass 
methods with any further limitation described in this disclosure, or those following, specified alone 
5 or in any combination: Optionally, said JfKLP-related biallelic marker is selected from the group 
consisting of Al to A32, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, said ifKLP-related biallelic marker is selected from the group 
consisting of A 1 to A17, and A20 to A22, and the complements thereof, or optionally the biallelic 
markers in linkage disequilibrium therewith; optionally, said /KZ./'-related biallelic marker is 
1 0 selected from the group consisting of A23 and A24, and the complements thereof, or optionally the 
biallelic markers in linkage disequilibrium therewith; Optionally, said haplotype determination 
method is performed by asymmetric PCR amplification, double PCR amplification of specific 
alleles, the Clark algorithm, or an expectation-maximization algorithm. 

Linkage Disequilibrium Analysis 

1 5 Linkage disequilibrium is the non-random association of alleles at two or more loci and 

represents a powerful tool for mapping genes involved in disease traits (see Ajioka R.S. et al., 1997). 
Biallelic markers, because they are densely spaced in the human genome and can be genotyped in 
greater numbers than other types of genetic markers (such as RFLP or VNTR markers), are 
particularly useful in genetic analysis based on linkage disequilibrium. 

20 When a disease mutation is first introduced into a population (by a new mutation or the 

immigration of a mutation carrier), it necessarily resides on a single chromosome and thus on a 
single "background" or "ancestral" haplotype of linked markers. Consequently, there is complete 
disequilibrium between these markers and the disease mutation: one finds the disease mutation only 
in the presence of a specific set of marker alleles. Through subsequent generations recombination 

25 events occur between the disease mutation and these marker polymorphisms, and the disequilibrium 
gradually dissipates. The pace of this dissipation is a function of the recombination frequency, so 
the markers closest to the disease gene will manifest higher levels of disequilibrium than those that 
are further away. When not broken up by recombination, "ancestral" haplotypes and linkage 
disequilibrium between marker alleles at different loci can be tracked not only through pedigrees but 

30 also through populations. Linkage disequilibrium is usually seen as an association between one 
specific allele at one locus and another specific allele at a second locus. 

The pattern or curve of disequilibrium between disease and marker loci is expected to 
exhibit a maximum that occurs at the disease locus. Consequently, the amount of linkage 
disequilibrium between a disease allele and closely linked genetic markers may yield valuable 

35 information regarding the location of the disease gene. For fine-scale mapping of a disease locus, it 
is useful to have some knowledge of the patterns of linkage disequilibrium that exist between 
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markers in the studied region. As mentioned above the mapping resolution achieved through the 
analysis of linkage disequilibrium is much higher than that of linkage studies. The high density of 
biallelic markers combined with linkage disequilibrium analysis provides powerful tools for fine- 
scale mapping. Different methods to calculate linkage disequilibrium are described below under the 
5 heading "Statistical Methods" 

Population-Based Case-Control Studies Of Trait-Marker Associations 
As mentioned above, the occurrence of pairs of specific alleles at different loci on the same 
chromosome is not random and the deviation from random is called linkage disequilibrium. 
Association studies focus on population frequencies and rely on the phenomenon of linkage 

10 disequilibrium. If a specific allele in a given gene is directly involved in causing a particular trait, its 
frequency will be statistically increased in an affected (trait positive) population, when compared to 
the frequency in a trait negative population or in a random control population. As a consequence of 
the existence of linkage disequilibrium, the frequency of all other alleles present in the haplotype 
carrying the trait-causing allele will also be increased in trait positive individuals compared to trait 

1 5 negative individuals or random controls. Therefore, association between the trait and any allele 
(specifically a biallelic marker allele) in linkage disequilibrium with the trait-causing allele will 
suffice to suggest the presence of a trait-related gene in that particular region. Case-control 
populations can be genotyped for biallelic markers to identify associations that narrowly locate a 
trait causing allele. As any marker in linkage disequilibrium with one given marker associated with 

20 a trait will be associated with the trait. Linkage disequilibrium allows the relative frequencies in 
case-control populations of a limited number of genetic polymorphisms (specifically biallelic 
markers) to be analyzed as an alternative to screening all possible functional polymorphisms in order 
to find trait-causing alleles. Association studies compare the frequency of marker alleles in 
unrelated case-control populations, and represent powerful tools for the dissection of complex traits. 

25 Case-Control Populations (Inclusion Criteria^ 

Population-based association studies do not concern familial inheritance but compare the 
prevalence of a particular genetic marker, or a set of markers, in case-control populations. They are 
case-control studies based on comparison of unrelated case (affected or trait positive) individuals 
and unrelated control (unaffected, trait negative or random) individuals. Preferably the control 

30 group is composed of unaffected or trait negative individuals. Further, the control group is 

ethnically matched to the case population. Moreover, the control group is preferably matched to the 
case-population for the main known confusion factor for the trait under study (for example age- 
matched for an age-dependent trait). Ideally, individuals in the two samples are paired in such a way 
that they are expected to differ only in their disease status. The terms 'trait positive population", 

35 "case population" and "affected population" are used interchangeably herein. 
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An important step in the dissection of complex traits using association studies is the choice 
of case-control populations (see Lander and Schork, 1994). A major step in the choice of case- 
control populations is the clinical definition of a given trait or phenotype. Any genetic trait may be 
analyzed by the association method proposed here by carefully selecting the individuals to be 

S included in the trait positive and trait negative phenotypic groups. Four criteria are often useful: 
clinical phenotype, age at onset, family history and severity. The selection procedure for continuous 
or quantitative traits (such as blood pressure for example) involves selecting individuals at opposite 
ends of the phenotype distribution of the trait under study, so as to include in these trait positive and 
trait negative populations individuals with non-overlapping phenotypes. Preferably, case-control 

10 populations consist of phenotypically homogeneous populations. Trait positive and trait negative 
populations consist of phenotypically uniform populations of individuals representing each between 
1 and 98%, preferably between 1 and 80%, more preferably between 1 and 50%, and more 
preferably between 1 and 30%, most preferably between 1 and 20% of the total population under 
study, and preferably selected among individuals exhibiting non-overlapping phenotypes. The 

1 5 clearer the difference between the two trait phenotypes, the greater the probability of detecting an 
association with biallelic markers. The selection of those drastically different but relatively uniform 
phenotypes enables efficient comparisons in association studies and the possible detection of marked 
differences at the genetic level, provided that the sample sizes of the populations under study are 
significant enough. 

20 In preferred embodiments, a first group of between 50 and 300 trait positive individuals, 

preferably about 100 individuals, are recruited according to their phenotypes. A similar number of 
control individuals are included in such studies. 
Association Analysis 

The invention also comprises methods of detecting an association between a genotype and a 
25 phenotype, comprising the steps of: a) determining the frequency of at least one /fAZP-related 
biallelic marker in a trait positive population according to a genotyping method of the invention; b) 
determining the frequency of said //KLP-related biallelic marker in a control population according to 
a genotyping method of the invention; and c) determining whether a statistically significant 
association exists between said genotype and said phenotype. In addition, the methods of detecting 
30 an association between a genotype and a phenotype of the invention encompass methods with any 
further limitation described in this disclosure, or those following, specified alone or in any 
combination: Optionally, said //ALP-related biallelic marker is selected from the group consisting of 
Al to A32, and the complements thereof, or optionally the biallelic markers in linkage 
disequilibrium therewith; optionally, said /YKtP-related biallelic marker is selected from the group 
35 consisting of A 1 to A 1 7, and A20 to A22, and the complements thereof, or optionally the biallelic 
markers in linkage disequilibrium therewith; optionally, said iZKLP-related biallelic marker is 
selected from the group consisting of A23 and A24, and the complements thereof, or optionally the 
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biallelic markers in linkage disequilibrium therewith; Optionally, said control population may be a 
trait negative population, or a random population; Optionally, each of said genotyping steps a) and 
b) may be performed on a pooled biological sample derived from each of said populations; 
Optionally, each of said genotyping of steps a) and b) is performed separately on biological samples 

5 derived from each individual in said population or a subsample thereof. 

The general strategy to perform association studies using biallelic markers derived from a 
region carrying a candidate gene is to scan two groups of individuals (case-control populations) in 
order to measure and statistically compare the allele frequencies of the biallelic markers of the 
present invention in both groups. 

10 If a statistically significant association with a trait is identified for at least one or more of the 

analyzed biallelic markers, one can assume that: either the associated allele is directly responsible 
for causing the trait (i.e. the associated allele is the trait causing allele), or more likely the associated 
allele is in linkage disequilibrium with the trait causing allele. The specific characteristics of the 
associated allele with respect to the candidate gene function usually give further insight into the 

1 5 relationship between the associated allele and the trait (causal or in linkage disequilibrium). If the 
evidence indicates that the associated allele within the candidate gene is most probably not the trait 
causing allele but is in linkage disequilibrium with the real trait causing allele, then the trait causing 
allele can be found by sequencing the vicinity of the associated marker, and performing further 
association studies with the polymorphisms that are revealed in an iterative manner. 

20 Association studies are usually run in two successive steps. In a first phase, the frequencies 

of a reduced number of biallelic markers from the candidate gene are determined in the trait positive 
and control populations. In a second phase of the analysis, the position of the genetic loci 
responsible for the given trait is further refined using a higher density of markers from the relevant 
region. However, if the candidate gene under study is relatively small in length, as is the case for 

25 HKLP, a single phase may be sufficient to establish significant associations. 
Haplotvpe Analysis 

As described above, when a chromosome carrying a disease allele first appears in a 
population as a result of either mutation or migration, the mutant allele necessarily resides on a 
chromosome having a set of linked markers: the ancestral haplotype. This haplotype can be tracked 

30 through populations and its statistical association with a given trait can be analyzed. 

Complementing single point (allelic) association studies with multi-point association studies also 
called haplotype studies increases the statistical power of association studies. Thus, a haplotype 
association study allows one to define the frequency and the type of the ancestral carrier haplotype. 
A haplotype analysis is important in that it increases the statistical power of an analysis involving 

35 individual markers. 

In a first stage of a haplotype frequency analysis, the frequency of the possible haplotypes 
based on various combinations of the identified biallelic markers of the invention is determined. 
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The haplotype frequency is then compared for distinct populations of trait positive and control 
individuals. The number of trait positive individuals, which should be, subjected to this analysis to 
obtain statistically significant results usually ranges between 30 and 300, with a preferred number of 
individuals ranging between SO and 1 50. The same considerations apply to the number of 
5 unaffected individuals (or random control) used in the study. The results of this first analysis 
provide haplotype frequencies in case-control populations, for each evaluated haplotype frequency a 
p-value and an odd ratio are calculated. If a statistically significant association is found the relative 
risk for an individual carrying the given haplotype of being affected with the trait under study can be 
approximated. 

10 An additional embodiment of the present invention encompasses methods of detecting an 

association between a haplotype and a phenotype, comprising the steps of: a) estimating the 
frequency of at least one haplotype in a trait positive population, according to a method of the 
invention for estimating the frequency of a haplotype; b) estimating the frequency of said haplotype 
in a control population, according to a method of the invention for estimating the frequency of a 

1 5 haplotype; and c) determining whether a statistically significant association exists between said 
haplotype and said phenotype. In addition, the methods of detecting an association between a 
haplotype and a phenotype of the invention encompass methods with any further limitation 
described in this disclosure, or those following: Optionally, said HKLP-Te\ated biallelic marker is 
selected from the group consisting of A 1 to A32, and the complements thereof, or optionally the 

20 biallelic markers in linkage disequilibrium therewith; optionally, said HKLP-re\ated biallelic marker 
is selected from the group consisting of A 1 to A 17, and A20 to A22, and the complements thereof, 
or optionally the biallelic markers in linkage disequilibrium therewith; optionally, said //AZP-related 
biallelic marker is selected from the group consisting of A23 and A24, and the complements 
thereof, or optionally the biallelic markers in linkage disequilibrium therewith; Optionally, said 

25 control population is a trait negative population, or a random population. Optionally, said method 
comprises the additional steps of determining the phenotype in said trait positive and said control 
populations prior to step c). 
Interaction Analysis 

The biallelic markers of the present invention may also be used to identify patterns of 
30 biallelic markers associated with detectable traits resulting from polygenic interactions. The analysis 
of genetic interaction between alleles at unlinked loci requires individual genotyping using the 
techniques described herein. The analysis of allelic interaction among a selected set of biallelic 
markers with appropriate level of statistical significance can be considered as a haplotype analysis. 
Interaction analysis consists in stratifying the case-control populations with respect to a given 
35 haplotype for the first loci and performing a haplotype analysis with the second loci with each 
subpopulation. 

Statistical methods used in association studies are further described below. 
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Testing For Linkage In The Presence Of Association 

The biallelic markers of the present invention may further be used in TOT 

(transmission/disequilibrium test). TDT tests for both linkage and association and is not affected by 

population stratification. TDT requires data for affected individuals and their parents or data from 
5 unaffected sibs instead of from parents (see Spielmann S. et al., 1993; Schaid D.J. et al., 1996, 

Spielmann S. and Ewens W J., 1 998). Such combined tests generally reduce the false - positive 

errors produced by separate analyses. 

Statistical methods 

In general, any method known in the art to test whether a trait and a genotype show a 
1 0 statistically significant correlation may be used. 

1) Methods In Linkage Analysis 

Statistical methods and computer programs useful for linkage analysis are well-known to 
those skilled in the art (see Terwilliger J.D. and Ott J., 1994; Ott J., 1991). 

2) Methods To Estimate Haplotype Frequencies In A Population 

1 5 As described above, when genotypes are scored, it is often not possible to distinguish 

heterozygotes so that haplotype frequencies cannot be easily inferred. When the gametic phase is 
not known, haplotype frequencies can be estimated from the multilocus genotypic data. Any method 
known to person skilled in the art can be used to estimate haplotype frequencies (see Lange K., 
1997; Weir, B.S., 1996) Preferably, maximum-likelihood haplotype frequencies are computed using 

20 an Expectation- Maximization (EM) algorithm (see Dempster et al., 1977; Excoffier L. and Slatkin 
M., 1995). This procedure is an iterative process aiming at obtaining maximum-likelihood estimates 
of haplotype frequencies from multi-locus genotype data when the gametic phase is unknown. 
Haplotype estimations are usually performed by applying the EM algorithm using for example the 
EM-HAPLO program (Hawley M. E. et al., 1994) or the Arlequin program (Schneider et al., 1997). 

25 The EM algorithm is a generalized iterative maximum likelihood approach to estimation and is 
briefly described below. 

Please note that in the present section, "Methods To Estimate Haplotype Frequencies In A 
Population, w of this text, phenotypes will refer to multi-locus genotypes with unknown phase. 
Genotypes will refer to known-phase multi-locus genotypes. 

30 A sample of N unrelated individuals is typed for K markers. The data observed are the 

unknown-phase K-iocus phenotypes that can categorized in F different phenotypes. Suppose that we 
have H underlying possible haplotypes (in case of K biallelic markers, H=2 K ). 

For phenotype j, suppose that Cj genotypes are possible. We thus have the following 
equation 



10 
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Pj = I pr(genotypei ) = Z pr{h k , h t ) Equation 1 

/=1 /=l 

where Pj is the probability of the phenotypey, h k and h, are the two haplotypes constituent 
the genotype /. Under the Hardy-Weinberg equilibrium, pr(h h h} becomes: 

Pr(h k ,*/) = pr(h k ) if h k = A, , />r(A* , A, ) = 2pr(A, )./>r(A, ) if A* * A, . Equation 2 
5 The successive steps of the E-M algorithm can be described as follows: 

Starting with initial values of the of haplotypes frequencies, noted p\ 0) ,/><° > , , these 

initial values serve to estimate the genotype frequencies (Expectation step) and then estimate another 

set of haplotype frequencies (Maximization step), noted p\ ] \ /><)> , these two steps are 

iterated until changes in the sets of haplotypes frequency are very small. 

A stop criterion can be that the maximum difference between haplotype frequencies between 
two iterations is less than 10" 7 . These values can be adjusted according to the desired precision of 
estimations. 



At a given iteration s, the Expectation step consists in calculating the genotypes frequencies 
by the following equation: 



) 



pr{genotypei)^ = pr(phenotype j\pr (genotype i\phenotype y) (5 
15 _ n j pr(h k9 hg) (s) Equation 3 

where genotype i occurs in phenotypey, and where h k and h, constitute genotype /. Each 
probability is derived according to eq. 1, and eq. 2 described above. 



Then the Maximization step simply estimates another set of haplotype frequencies given the 
genotypes frequencies. This approach is also known as the gene-counting method (Smith, 1957). 
(s+1) \ F °i , x 

Pt =rE 1^ .pr(genotypei ) w Equation 4 

2 j=\i=l 

Where S it is an indicator variable which count the number of time haplotype / in genotype /. 
It takes the values of 0, 1 or 2. 

To ensure that the estimation finally obtained is the maximum-likelihood estimation several 
values of departures are required. The estimations obtained are compared and if they are different 
the estimations leading to the best likelihood are kept. 
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3) Methods To Calculate Linkage Disequilibrium Between Markers 

A number of methods can be used to calculate linkage disequilibrium between any two 

genetic positions, in practice linkage disequilibrium is measured by applying a statistical association 

test to haplotype data taken from a population. 

Linkage disequilibrium between any pair of biallelic markers comprising at least one of the 

biallelic markers of the present invention (M b Mj) having alleles (a/bj) at marker Mj and alleles 

(a/bj) at marker Mj can be calculated for every allele combination (a,^ ai,b j; b i5 aj and b^bj), according 

to the Piazza formula: 

V04 W (04 + 93) (94 +92), where: 
04= - - = frequency of genotypes not having allele a; at Mj and not having allele aj at Mj 
03= - + = frequency of genotypes not having allele ai at Mi and having allele aj at Mj 
62= + - = frequency of genotypes having allele a; at Mi and not having allele aj at Mj 



Linkage disequilibrium (LD) between pairs of biallelic markers (Mj, Mj) can also be 
1 5 calculated for every allele combination (ai,aj : ai,bj ; bi,aj and b is bj), according to the maximum- 
likelihood estimate (MLE) for delta (the composite genotypic disequilibrium coefficient), as 
described by Weir (Weir B. S., 1996). The MLE for the composite linkage disequilibrium is: 

Daiaj= (2n, + n 2 + n 3 + m/2)/N - 2(pr(ai). pr(aj)) 

Where n, = 2 phenotype (a/aj, a/a;), n 2 = I phenotype (a/a,-, a/bj), n 3 = I phenotype (a/bi, 
20 a/ a i)» n4= £ phenotype (a/bj, a/bj) and N is the number of individuals in the sample. 

This formula allows linkage disequilibrium between alleles to be estimated when only 
genotype, and not haplotype, data are available. 



Another means of calculating the linkage disequilibrium between markers is as follows. For 
25 a couple of biallelic markers, M (a/bi) and Mj (q/bj), fitting the Hardy- Weinberg equilibrium, one 
can estimate the four possible haplotype frequencies in a given population according to the approach 
described above. 

The estimation of gametic disequilibrium between ai and aj is simply: 
D aiaj = pr(haplotype(ai ,<*;))- pr(a t ).pr(a } ). 

30 Where pr(a) is the probability of allele a, and pr(^ is the probability of allele a j and where 

pr(haplotype (a it aj)) is estimated as in Equation 3 above. 

For a couple of biallelic marker only one measure of disequilibrium is necessary to describe 
the association between Af f and Mj. 

Then a normalized value of the above is calculated as follows: 
35 D'aiaj = D^j / max (-pr(ai). pr(aj) , -pr(bj). pr(bj)) with D^O 

D'aiaj = Daiaj/ max (pr(bi). pr(aj), pr(aj). pr(bj)) withDaiaj>0 



5 
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The skilled person will readily appreciate that other linkage disequilibrium calculation 
methods can be used. 

Linkage disequilibrium among a set of biallelic markers having an adequate heterozygosity 
rate can be determined by genotyping between 50 and 1000 unrelated individuals, preferably 
between 75 and 200, more preferably around 100. 



4) Testing For Association 

Methods for determining the statistical significance of a correlation between a phenotype 
and a genotype, in this case an allele at a biallelic marker or a haplotype made up of such alleles, 
may be determined by any statistical test known in the art and with any accepted threshold of 

10 statistical significance being required. The application of particular methods and thresholds of 
significance are well with in the skill of the ordinary practitioner of the art. 

Testing for association is performed by determining the frequency of a biallelic marker allele 
in case and control populations and comparing these frequencies with a statistical test to determine if 
their is a statistically significant difference in frequency which would indicate a correlation between 

1 5 the trait and the biallelic marker allele under study. Similarly, a haplotype analysis is performed by 
estimating the frequencies of all possible haplotypes for a given set of biallelic markers in case and 
control populations, and comparing these frequencies with a statistical test to determine if their is a 
statistically significant correlation between the haplotype and the phenotype (trait) under study. Any 
statistical tool useful to test for a statistically significant association between a genotype and a 

20 phenotype may be used. Preferably the statistical test employed is a chi-square test with one degree 
of freedom. A P-value is calculated (the P-value is the probability that a statistic as large or larger 
than the observed one would occur by chance). 
Statistical Significance 

In preferred embodiments, significance for diagnosis purposes, either as a positive basis for 
25 further diagnostic tests or as a preliminary starting point for early preventive therapy, the p value 
related to a biallelic marker association is preferably about 1 x 10" 2 or less, more preferably about 1 x 
10^ or less, for a single biallelic marker analysis and about 1 x 10' 3 or less, still more preferably 1 x 
10- 6 or less and most preferably of about 1 x 1 Odorless, for a haplotype analysis involving two or 
more markers. These values are believed to be applicable to any association studies involving single 
30 or multiple marker combinations. 

The skilled person can use the range of values set forth above as a starting point in order to 
carry out association studies with biallelic markers of the present invention. In doing so, significant 
associations between the biallelic markers of the present invention and a trait can be revealed and 
used for diagnosis and drug screening purposes. 
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Phenotvpic Permutation 

In order to confirm the statistical significance of the first stage haplotype analysis described 
above, it might be suitable to perform further analyses in which genotyping data from case-control 
individuals are pooled and randomized with respect to the trait phenotype. Each individual 
5 genotyping data is randomly allocated to two groups, which contain the same number of individuals 
as the case-control populations used to compile the data obtained in the first stage. A second stage 
haplotype analysis is preferably run on these artificial groups, preferably for the markers included in 
the haplotype of the first stage analysis showing the highest relative risk coefficient. This 
experiment is reiterated preferably at least between 100 and 10000 times. The repeated iterations 
1 0 allow the determination of the probability to obtain the tested haplotype by chance. 

Assessment Of Statistical Association 

To address the problem of false positives similar analysis may be performed with the same 
case-control populations in random genomic regions. Results in random regions and the candidate 
region are compared as described in a co-pending US Provisional Patent Application entitled 
1 5 "Methods, Software And Apparati For Identifying Genomic Regions Harboring A Gene Associated 
With A Detectable Trait," U.S. Serial Number 60/107,986, filed November 10, 1998, the contents 
of which are incorporated herein by reference. 

5) Evaluation Of Risk Factors 

The association between a risk factor (in genetic epidemiology the risk factor is the presence 
20 or the absence of a certain allele or haplotype at marker loci) and a disease is measured by the odds 
ratio (OR) and by the relative risk (RR). If P(R + ) is the probability of developing the disease for 
individuals with R and P(R ) is the probability for individuals without the risk factor, then the 
relative risk is simply the ratio of the two probabilities, that is: 
RR= P(R + yP(R) 

25 In case-control studies, direct measures of the relative risk cannot be obtained because of the 

sampling design. However, the odds ratio allows a good approximation of the relative risk for low- 
incidence diseases and can be calculated: 



[l-F+j/ld-F-) 



OR=(F + /(l-F + ))/(r/(l-F-)) 

F* is the frequency of the exposure to the risk factor in cases and F is the frequency of the 
30 exposure to the risk factor in controls. F* and F are calculated using the allelic or haplotype 

frequencies of the study and further depend on the underlying genetic model (dominant, recessive, 
additive...). 

One can further estimate the attributable risk (AR) which describes the proportion of 
individuals in a population exhibiting a trait due to a given risk factor. This measure is important in 
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quantifying the role of a specific factor in disease etiology and in terms of the public health impact 
of a risk factor. The public health relevance of this measure lies in estimating the proportion of 
cases of disease in the population that could be prevented if the exposure of interest were absent. 
AR is determined as follows: 
5 AR = P E (RR-1)/ (P E (RR-1)+1) 

AR is the risk attributable to a biallelic marker allele or a biallelic marker haplotype. P E is 
the frequency of exposure to an allele or a haplotype within the population at large; and RR is the 
relative risk which, is approximated with the odds ratio when the trait under study has a relatively 
low incidence in the general population. 

10 M "«"fc"«tan Of Biallelic Markers In Linkage m ^ ufflbrinm With The Biallelic Mark^ »f 

the Invention 

Once a first biallelic marker has been identified in a genomic region of interest, the 
practitioner of ordinary skill in the art, using the teachings of the present invention, can easily 
identify additional biallelic markers in linkage disequilibrium with this first marker. As mentioned 

1 5 before any marker in linkage disequilibrium with a first marker associated with a trait will be 
associated with the trait. Therefore, once an association has been demonstrated between a given 
biallelic marker and a trait, the discovery of additional biallelic markers associated with this trait is 
of great interest in order to increase the density of biallelic markers in this particular region. The 
causal gene or mutation will be found in the vicinity of the marker or set of markers showing the 

20 highest correlation with the trait. 

Identification of additional markers in linkage disequilibrium with a given marker involves: 

(a) amplifying a genomic fragment comprising a first biallelic marker from a plurality of individuals; 

(b) identifying of second biallelic markers in the genomic region harboring said first biallelic 
marker, (c) conducting a linkage disequilibrium analysis between said first biallelic marker and 

25 second biallelic markers; and (d) selecting said second biallelic markers as being in linkage 
disequilibrium with said first marker. Subcombinations comprising steps (b) and (c) are also 
contemplated. 

Methods to identify biallelic markers and to conduct linkage disequilibrium analysis are 
described herein and can be carried out by the skilled person without undue experimentation. The 
30 present invention then also concerns biallelic markers which are in linkage disequilibrium with the 
specific biallelic markers Al to A32 and which are expected to present similar characteristics in 
terms of their respective association with a given trait. In a preferred embodiment, the invention 
concerns biallelic markers which are in linkage disequilibrium with the specific biallelic markers 



35 



Identification Of Functional Mutations 
Mutations in the HKLP gene which are responsible for a detectable phenotype or trait may 
be identified by comparing the sequences of the HKLP gene from trait positive and control 
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individuals. Once a positive association is confirmed with a biallelic marker of the present 
invention, the identified locus can be scanned for mutations. In a preferred embodiment, functional 
regions such as exons and splice sites, promoters and other regulatory regions of the HKLP gene are 
scanned for mutations. In a preferred embodiment the sequence of the HKLP gene is compared in 
5 trait positive and control individuals. Preferably, trait positive individuals carry the haplotype 
shown to be associated with the trait and trait negative individuals do not carry the haplotype or 
allele associated with the trait. The detectable trait or phenotype may comprise a variety of 
manifestations of altered HKLP function. 

The mutation detection procedure is essentially similar to that used for biallelic marker 
1 0 identification. The method used to detect such mutations generally comprises the following steps: 

- amplification of a region of the HKLP gene comprising a biallelic marker or a group of 
biallelic markers associated with the trait from DNA samples of trait positive patients and trait- 
negative controls; 

- sequencing of the amplified region; 

1 5 - comparison of DNA sequences from trait positive and control individuals; 

- determination of mutations specific to trait-positive patients. 

In one embodiment, said biallelic marker is selected from the group consisting of Al to A32, 
and the complements thereof. It is preferred that candidate polymorphisms be then verified by 
screening a larger population of cases and controls by means of any genotyping procedure such as 
20 those described herein, preferably using a microsequencing technique in an individual test format. 
Polymorphisms are considered as candidate mutations when present in cases and controls at 
frequencies compatible with the expected association results. Polymorphisms are considered as 
candidate "trait-causing" mutations when they exhibit a statistically significant correlation with the 
detectable phenotype. 

25 Recombinant Vectors 

The term "vector" is used herein to designate either a circular or a linear DNA or RNA 

molecule, which is either double-stranded or single-stranded, and which comprise at least one 

polynucleotide of interest that is sought to be transferred in a cell host or in a unicellular or 

multicellular host organism. 
30 The present invention encompasses a family of recombinant vectors that comprise a 

regulatory polynucleotide derived from the HKLP genomic sequence, and/or a coding 

polynucleotide from either the HKLP genomic sequence or the cDNA sequence. 

Generally, a recombinant vector of the invention may comprise any of the polynucleotides 

described herein, including regulatory sequences, coding sequences and polynucleotide constructs, 
35 as well as any HKLP primer or probe as defined above. More particularly, the recombinant vectors 

of the present invention can comprise any of the polynucleotides described in the "Genomic 
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Sequences Of tThe HKLP Gene" section, the "HKLP cDNA Sequences" section, the "Coding 
Regions" section, the "Polynucleotide constructs" section, and the "Oligonucleotide Probes And 
Primers" section. 

In a first preferred embodiment, a recombinant vector of the invention is used to amplify the 
5 inserted polynucleotide derived from a HKLP genomic sequence of SEQ ID Nos 1 and 2 or a HKLP 

cDNA, for example the cDNA of SEQ ID No 3 in a suitable cell host, this polynucleotide being 

amplified at every time that the recombinant vector replicates. 

A second preferred embodiment of the recombinant vectors according to the invention 

consists of expression vectors comprising either a regulatory polynucleotide or a coding nucleic acid 
10 of the invention, or both. Within certain embodiments, expression vectors are employed to express 

the HKLP polypeptide which can be then purified and, for example be used in ligand screening 

assays or as an immunogen in order to raise specific antibodies directed against the HKLP protein. 

In other embodiments, the expression vectors are used for constructing transgenic animals and also 

for gene therapy. Expression requires that appropriate signals are provided in the vectors, said 
1 5 signals including various regulatory elements, such as enhancers/promoters from both viral and 

mammalian sources that drive expression of the genes of interest in host cells. Dominant drug 

selection markers for establishing permanent, stable cell clones expressing the products are generally 

included in the expression vectors of the invention, as they are elements that link expression of the 

drug selection markers to expression of the polypeptide. 
20 More particularly, the present invention relates to expression vectors which include nucleic 

acids encoding a HKLP protein, preferably the HKLP protein of the amino acid sequence of SEQ ID 

No 4 or variants or fragments thereof. 

The invention also pertains to a recombinant expression vector useful for the expression of 

the HKLP coding sequence, wherein said vector comprises a nucleic acid of SEQ ID No 3. 
25 Recombinant vectors comprising a nucleic acid containing a HKLP-rehted biallelic marker 

is also part of the invention. In a preferred embodiment, said biallelic marker is selected from the 

group consisting of A 1 to A32, and the complements thereof. 

Some of the elements which can be found in the vectors of the present invention are 

described in further detail in the following sections. 

3° L General features of the expression vectors of the invention 

A recombinant vector according to the invention comprises, but is not limited to, a YAC 
(Yeast Artificial Chromosome), a BAC (Bacterial Artificial Chromosome), a phage, a phagemid, a 
cosmid, a plasmid or even a linear DNA molecule which may consist of a chromosomal, non- 
chromosomal, semi-synthetic and synthetic DNA. Such a recombinant vector can comprise a 
35 transcriptional unit comprising an assembly of: 
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(1 ) a genetic element or elements having a regulatory role in gene expression, for example 
promoters or enhancers. Enhancers are cis-acting elements of DNA, usually from about 1 0 to 300 
bp in length that act on the promoter to increase the transcription. 

(2) a structural or coding sequence which is transcribed into mRNA and eventually 
5 translated into a polypeptide, said structural or coding sequence being operabh/ linked to the 

regulatory elements described in (1); and 

(3) appropriate transcription initiation and termination sequences. Structural units intended 
for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling 
extracellular secretion of translated protein by a host cell. Alternatively, when a recombinant protein 

10 is expressed without a leader or transport sequence, it may include a N-terminal residue. This 
residue may or may not be subsequently cleaved from the expressed recombinant protein to provide 
a final product. 

Generally, recombinant expression vectors will include origins of replication, selectable 
markers permitting transformation of the host cell, and a promoter derived from a highly expressed 
1 5 gene to direct transcription of a downstream structural sequence. The heterologous structural 
sequence is assembled in appropriate phase with translation initiation and termination sequences, 
and preferably a leader sequence capable of directing secretion of the translated protein into the ' 
periplasms space or the extracellular medium. In a specific embodiment wherein the vector is 
adapted for transfecting and expressing desired sequences in mammalian host cells, preferred vectors 

20 will comprise an origin of replication in the desired host, a suitable promoter and enhancer, and also 
any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, 
transcriptional termination sequences, and 5'-flanking non-transcribed sequences. DNA sequences 
derived from the SV40 viral genome, for example SV40 origin, early promoter, enhancer, splice and 
polyadenylation sites may be used to provide the required non-transcribed genetic elements. 

25 The in vivo expression of a HKLP polypeptide of SEQ ID No 4 or fragments or variants 

thereof may be useful in order to correct a genetic defect related to the expression of the native gene 
in a host organism or to the production of a biologically inactive HKLP protein. 

Consequently, the present invention also deals with recombinant expression vectors mainly 
designed for the in vivo production of the HKLP polypeptide of SEQ ID No 4 or fragments or 

30 variants thereof by the introduction of the appropriate genetic material in the organism of the patient 
to be treated. This genetic material may be introduced in vitro in a cell that has been previously 
extracted from the organism, the modified cell being subsequently reintroduced in the said organism, 
directly in vivo into the appropriate tissue. 
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2. Regulatory Elements 
Promoters 

The suitable promoter regions used in the expression vectors according to the present 
invention are chosen taking into account the cell host in which the heterologous gene has to be 
5 expressed. The particular promoter employed to control the expression of a nucleic acid sequence of 
interest is not believed to be important, so long as it is capable of directing the expression of the 
nucleic acid in the targeted cell. Thus, where a human cell is targeted, it is preferable to position the 
nucleic acid coding region adjacent to and under the control of a promoter that is capable of being 
expressed in a human cell, such as, for example, a human or a viral promoter. 
10 A suitable promoter may be heterologous with respect to the nucleic acid for which it 

controls the expression or alternatively can be endogenous to the native polynucleotide containing 
the coding sequence to be expressed. Additionally, the promoter is generally heterologous with 
respect to the recombinant vector sequences within which the construct promoter/coding sequence 
has been inserted. 

1 5 Promoter regions can be selected from any desired gene using, for example, CAT 

(chloramphenicol transferase) vectors and more preferably pKK232-8 and pCM7 vectors. 

Preferred bacterial promoters are the Lad, LacZ, the T3 or T7 bacteriophage RNA 
polymerase promoters, the gpt, lambda PR, PL and trp promoters (EP 0036776), the polyhedrin 
promoter, or the plO protein promoter from baculovirus (Kit Novagen) (Smith et aL, 1983; O'Reilly 
20 et al., 1 992), the lambda PR promoter or also the trc promoter. 

Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late 
SV40, LTRs from retrovirus, and mouse metal loth ionein-L. Selection of a convenient vector and 
promoter is well within the level of ordinary skill in the art. 

The choice of a promoter is well within the ability of a person skilled in the field of genetic 
25 egineering. For example, one may refer to the book of Sambrook et al.(1989) or also to the 
procedures described by Fuller et al.(1996). 
Other regulatory elements 

Where a cDNA insert is employed, one will typically desire to include a polyadenylation 
signal to effect proper polyadenylation of the gene transcript. The nature of the polyadenylation 
30 signal is not believed to be crucial to the successful practice of the invention, and any such sequence 
may be employed such as human growth hormone and SV40 polyadenylation signals. Also 
contemplated as an element of the expression cassette is a terminator. These elements can serve to 
enhance message levels and to minimize read through from the cassette into other sequences. 

3. Selectable Markers 

35 Such markers would confer an identifiable change to the cell permitting easy identification 

of cells containing the expression construct. The selectable marker genes for selection of 
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transformed host cells are preferably dihydrofolate reductase or neomycin resistance for eukaryotic 
cell culture, TRP1 for S. cerevisiae or tetracycline, rifampicin or ampicillin resistance in E. coli, or 
levan saccharase for mycobacteria, this latter marker being a negative selection marker. 

4. Preferred Vectors. 

5 Bacterial vectors 

As a representative but non-limiting example, useful expression vectors for bacterial use can 
comprise a selectable marker and a bacterial origin of replication derived from commercially 
available plasmids comprising genetic elements of pBR322 (ATCC 3701 7). Such commercial 
vectors include, for example, pKK223-3 (Pharmacia, Uppsala, Sweden), and GEM1 (Promega 
1 0 Biotec, Madison, WI, USA). 

Large numbers of other suitable vectors are known to those of skill in the art, and 
commercially available, such as the following bacterial vectors: pQE70, pQE60, pQE-9 (Qiagen), 
pbs, pDIO, phagescript, psiX174, pbluescript SK, pbsks, pNH8A, pNH16A, pNH18A, pNH46A 
(Stratagene); ptrc99a, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia); pWLNEO, pSV2CAT, 
1 5 pOG44, pXTl, pSG (Stratagene); pSVK3, pBPV, pMSG, pSVL (Pharmacia); pQE-30 
(QIAexpress). 

Bacteriophage vectors 

The PI bacteriophage vector may contain large inserts ranging from about 80 to about 100 

kb. 

20 The construction of PI bacteriophage vectors such as pl58 or pl58/neo8 are notably 

described by Sternberg (1992, 1994). Recombinant PI clones comprising HKLP nucleotide 
sequences may be designed for inserting large polynucleotides of more than 40 kb (Linton et al., 
1993). To generate PI DNA for transgenic experiments, a preferred protocol is the protocol 
described by McCormick et al.(1994). Briefly, E. coli (preferably strain NS3529) harboring the PI 

25 plasmid are grown overnight in a suitable broth medium containing 25 ug/ml of kanamycin. The PI 
DNA is prepared from the E coli by alkaline lysis using the Qiagen Plasmid Maxi kit (Qiagen, 
Chatsworth, CA, USA), according to the manufacturer's instructions. The PI DNA is purified from 
the bacterial lysate on two Qiagen-tip 500 columns, using the washing and elution buffers contained 
in the kit. A phenol/chloroform extraction is then performed before precipitating the DNA with 70% 

30 ethanol. After solubilizing the DNA in TE (10 mM Tris-HCI, pH 7.4, 1 mM EDTA), the 
concentration of the DNA is assessed by spectrophotometry. 

When the goal is to express a PI clone comprising HKLP nucleotide sequences in a 
transgenic animal, typically in transgenic mice, it is desirable to remove vector sequences from the 
PI DNA fragment, for example by cleaving the PI DNA at rare-cutting sites within the PI 

35 polylinker (Sfil, Noll or Sail). The P 1 insert is then purified from vector sequences on a pulsed-field 
agarose gel, using methods similar using methods similar to those originally reported for the 



WO 00/63375 PCT/IBOO/00562 

77 

isolation of DNA from YACs (Schedl et al., 1993a; Peterson et al., 1993). At this stage, the 
resulting purified insert DNA can be concentrated, if necessary, on a Millipore Ultrafree-MC Filter 
Unit (Millipore, Bedford, MA, USA - 30,000 molecular weight limit) and then dialyzed against 
microinjection buffer (10 mM Tris-HCl, pH 7.4; 250 \iM EDTA) containing 100 mM NaCl, 30 \iM 
5 spermine, 70 \M spermidine on a microdyalisis membrane (type VS, 0.025 (iM from Millipore). 
The intactness of the purified PI DNA insert is assessed by electrophoresis on 1% agarose (Sea Kern 
GTG; FMC Bio-products) pulse-field gel and staining with ethidium bromide. 
Baculovirus vectors 

A suitable vector for the expression of the HKLP polypeptide of SEQ ID No 3 or fragments 
1 0 or variants thereof is a baculovirus vector that can be propagated in insect cells and in insect cell 
lines. A specific suitable host vector system is the pVL1392/1393 baculovirus transfer vector 
(Pharmingen) that is used to transfect the SF9 cell line (ATCC N°CRL 1711) which is derived from 
Spodoptera frugiperda. 

Other suitable vectors for the expression of the HKLP polypeptide of SEQ ID No 3 or 
1 5 fragments or variants thereof in a baculovirus expression system include those described by Chai et 
al.(1993), Vlasak et al.(1983) and Lenhard et ai.(1996). 
Viral vectors 

In one specific embodiment, the vector is derived from an adenovirus. Preferred adenovirus 
vectors according to the invention are those described by Feldman and Steg (1996) or Ohno et 

20 al.(1994). Another preferred recombinant adenovirus according to this specific embodiment of the 
present invention is the human adenovirus type 2 or 5 (Ad 2 or Ad 5) or an adenovirus of animal 
origin ( French patent application N° FR-93 .05954). 

Retrovirus vectors and adeno-associated virus vectors are generally understood to be the 
recombinant gene delivery systems of choice for the transfer of exogenous polynucleotides in vivo , 

25 particularly to mammals, including humans. These vectors provide efficient delivery of genes into 
cells, and the transferred nucleic acids are stably integrated into the chromosomal DNA of the host. 

Particularly preferred retroviruses for the preparation or construction of retroviral in vitro or 
in vitro gene delivery vehicles of the present invention include retroviruses selected from the group 
consisting of Mink-Cell Focus Inducing Virus, Murine Sarcoma Virus, Reticuloendotheliosis virus 

30 and Rous Sarcoma virus. Particularly preferred Murine Leukemia Viruses include the 4070A and 
the 1504A viruses, Abelson (ATCC No VR-999), Friend (ATCC No VR-245), Gross (ATCC No 
VR-590), Rauscher (ATCC No VR-998) and Moloney Murine Leukemia Virus (ATCC No VR-190; 
PCT Application No WO 94/24298). Particularly preferred Rous Sarcoma Viruses include Bryan 
high titer (ATCC Nos VR-334, VR-657, VR-726, VR-659 and VR-728). Other preferred retroviral 

35 vectors are those described in Roth et al.( 1 996), PCT Application No WO 93/25234, PCT 
Application No WO 94/ 06920, Roux et al., 1989, Julan et al., 1992 and Neda et al., 1991. 
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Yet another viral vector system that is contemplated by the invention consists in the adeno- 
associated virus (AAV). The adeno-associated virus is a naturally occurring defective virus that 
requires another virus, such as an adenovirus or a herpes virus, as a helper virus for efficient 
replication and a productive life cycle (Muzyczka et al., 1992). It is also one of the few viruses that 
5 may integrate its DNA into non-dividing cells, and exhibits a high frequency of stable integration 
(Flotte et al., 1992; Samulski et al., 1989; McLaughlin et al., 1989). One advantageous feature of 
AAV derives from its reduced efficacy for transducing primary cells relative to transformed cells. 

BAC vectors 

The bacterial artificial chromosome (BAC) cloning system (Shizuya et al., 1992) has been 
10 developed to stably maintain large fragments of genomic DNA (100-300 kb) in E. coli. A preferred 
BAC vector consists of pBeloBACl 1 vector that has been described by Kim et al.(1996). BAC 
libraries are prepared with this vector using size-selected genomic DNA that has been partially 
digested using enzymes that permit ligation into either the Bam HI or HindiW sites in the vector. 
Flanking these cloning sites are T7 and SP6 RNA polymerase transcription initiation sites that can 
15 be used to generate end probes by either RNA transcription or PCR methods. After the construction 
of a BAC library in E. coli, BAC DNA is purified from the host cell as a supercoiled circle. 
Converting these circular molecules into a linear form precedes both size determination and 
introduction of the BACs into recipient cells. The cloning site is flanked by two Not I sites, 
permitting cloned segments to be excised from the vector by Not I digestion. Alternatively, the 
20 DNA insert contained in the pBeloBACl 1 vector may be linearized by treatment of the BAC vector 
with the commercially available enzyme lambda terminase that leads to the cleavage at the unique 
costt site, but this cleavage method results in a full length BAC clone containing both the insert 
DNA and the BAC sequences. 

5. Delivery Of The Recombinant Vectors 

25 In order to effect expression of the polynucleotides and polynucleotide constructs of the 

invention, these constructs must be delivered into a cell. This delivery may be accomplished in 
vitro, as in laboratory procedures for transforming cell lines, or in vivo or ex vivo, as in the treatment 
of certain diseases states. 

One mechanism is viral infection where the expression construct is encapsulated in an 
30 infectious viral particle. 

Several non-viral methods for the transfer of polynucleotides into cultured mammalian cells 
are also contemplated by the present invention, and include, without being limited to, calcium 
phosphate precipitation (Graham etaL, 1973; Chen etal, 1987;), DEAE-dextran (Gopal, 1985), 
electroporation (Tur-Kaspa et al., 1986; Potter et al., 1984), direct microinjection (Harland et al., 
35 1985), DNA-loaded liposomes (Nicolau et al., 1982; Fraley et al, 1979), and receptor-mediated 
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transfection (Wu and Wu, 1987; 1988). Some of these techniques may be successfully adapted for 
in vivo or ex vivo use. 

Once the expression polynucleotide has been delivered into the cell, it may be stably 
integrated into the genome of the recipient cell. This integration may be in the cognate location and 
5 orientation via homologous recombination (gene replacement) or it may be integrated in a random, 
non specific location (gene augmentation). In yet further embodiments, the nucleic acid may be 
stably maintained in the cell as a separate, episomal segment of DNA. Such nucleic acid segments 
or "episomes" encode sequences sufficient to permit maintenance and replication independent of or 
in synchronization with the host cell cycle. 

10 One specific embodiment for a method for delivering a protein or peptide to the interior of a 

cell of a vertebrate in vivo comprises the step of introducing a preparation comprising a 
physiologically acceptable carrier and a naked polynucleotide operatively coding for the polypeptide 
of interest into the interstitial space of a tissue comprising the cell, whereby the naked 
polynucleotide is taken up into the interior of the cell and has a physiological effect. This is 

1 5 particularly applicable for transfer in vitro but it may be applied to in vivo as well. 

Compositions for use in vitro and in vivo comprising a "naked" polynucleotide are described 
in PCT application N» WO 90/1 1092 (Vical Inc.) and also in PCT application No. WO 95/1 1307 
(Institut Pasteur, INSERM, University d'Ottawa) as well as in the articles of Tacson et al.(l 996) and 
ofHuygenetal.(1996). 

20 In still another embodiment of the invention, the transfer of a naked polynucleotide of the 

invention, including a polynucleotide construct of the invention, into cells may be proceeded with a 
particle bombardment (biolistic), said particles being DNA-coated microprojectiles accelerated to a 
high velocity allowing them to pierce cell membranes and enter cells without killing them, such as 
described by Klein et al.(1987). 

25 In a further embodiment, the polynucleotide of the invention may be entrapped in a 

liposome (Ghosh and Bacchawat, 1991; Wong et al., 1980; Nicolau et al., 1987) 

In a specific embodiment, the invention provides a composition for the in vivo production of 
the HKLP protein or polypeptide described herein. It comprises a naked polynucleotide operatively 
coding for this polypeptide, in solution in a physiologically acceptable carrier, and suitable for 

30 introduction into a tissue to cause cells of the tissue to express the said protein or polypeptide. 

The amount of vector to be injected to the desired host organism varies according to the site 
of injection. As an indicative dose, it will be injected between 0,1 and 100 ug of the vector in an 
animal body, preferably a mammal body, for example a mouse body. 

In another embodiment of the vector according to the invention, it may be introduced in 

35 vitro in a host cell, preferably in a host cell previously harvested from the animal to be treated and 
more preferably a somatic cell such as a muscle cell. In a subsequent step, the cell that has been 
transformed with the vector coding for the desired HKLP polypeptide or the desired fragment 
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thereof is reintroduced into the animal body in order to deliver the recombinant protein within the 
body either locally or systemically. 

Cell Hosts 

Another object of the invention consists of a host cell that has been transformed or 
5 transfected with one of the polynucleotides described herein, and in particular a polynucleotide 
either comprising a HKLP regulatory polynucleotide or the coding sequence of the HKLP 
polypeptide selected from the group consisting of SEQ ID Nos 1-3 or a fragment or a variant 
thereof. Also included are host cells that are transformed (prokaryotic cells) or that are transfected 
(eukaryotic cells) with a recombinant vector such as one of those described above. More 
10 particularly, the cell hosts of the present invention can comprise any of the polynucleotides 

described in the "Genomic Sequences Of tThe HKLP Gene" section, the "HKLP cDNA Sequences- 
section, the "Coding Regions" section, the "Polynucleotide constructs" section, the "Oligonucleotide 
Probes And Primers" section and the "Recombinant Vectors" section. 

A further recombinant cell host according to the invention comprises a polynucleotide 
15 containing a biallelic marker selected from the group consisting of Al to A32, and the complements 
thereof. 

Preferred host cells used as recipients for the expression vectors of the invention are the 
following: 

a) Prokaryotic host cells: Escherichia coli strains (I.E.DH5-0 strain), Bacillus subtilis, 
20 Salmonella typhimurium, and strains from species like Pseudomonas, Streptomyces and 

Staphylococcus. 

b) Eukaryotic host cells: HeLa cells (ATCC N°CCL2; N°CCL2.1; N°CCL2.2), Cv 1 cells 
(ATCC N°CCL70), COS cells (ATCC N°CRL1650; N°CRL1651), Sf-9 cells (ATCC N°CRL171 1), 
C127 cells (ATCC N° CRL-1 804), 3T3 (ATCC N° CRL-6361), CHO (ATCC N° CCL-61), human ' 

25 kidney 293. (ATCC N° 45504; N° CRL-1573) and BHK (ECACC N° 84100501 ; N° 841 1 1301). 

c) Other mammalian host cells. 

The HKLP gene expression in mammalian, and typically human, cells may be rendered 
defective, or alternatively it may be proceeded with the insertion of a HKLP genomic or cDNA 
sequence with the replacement of the HKLP gene counterpart in the genome of an animal cell by a 
30 HKLP polynucleotide according to the invention. These genetic alterations may be generated by 
homologous recombination events using specific DNA constructs that have been previously 
described. 

One kind of cell hosts that may be used are mammal zygotes, such as murine zygotes. For 
example, murine zygotes may undergo microinjection with a purified DNA molecule of interest, for 
35 example a purified DNA molecule that has previously been adjusted to a concentration range from 1 
ng/ml -for BAC inserts- 3 ng/ul -for PI bacteriophage inserts- in 1 0 mM Tris-HCI, pH 7.4, 250 uM 
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EDTA containing 100 mM NaCl, 30 uM spermine, and70 uM spermidine. When the DNA to be 
microinjected has a large size, polyamines and high sah concentrations can be used in order to avoid 
mechanical breakage of this DNA, as described by Schedl et al (1993b). 

Anyone of the polynucleotides of the invention, including the DNA constructs described 
5 herein, may be introduced in an embryonic stem (ES) cell line, preferably a mouse ES cell line. ES 
cell lines are derived from pluripotent, uncommitted cells of the inner cell mass of pre-implantation 
blastocysts. Preferred ES cell lines are the following: ES-E14TG2a (ATCC n° CRL-1821), ES-D3 
(ATCC n° CRL1934 and n° CRL-1 1632), YS001 (ATCC n° CRL-1 1776), 36.5 (ATCC n° CRL- 
1 1 1 16). To maintain ES cells in an uncommitted state, they are cultured in the presence of growth 
10 inhibited feeder cells which provide the appropriate signals to preserve this embryonic phenotype 
and serve as a matrix for ES cell adherence. Preferred feeder cells consist of primary embryonic 
fibroblasts that are established from tissue of day 13- day 14 embryos of virtually any mouse strain, 
that are maintained in culture, such as described by Abbondanzo et al.(1993) and are inhibited in 
growth by irradiation, such as described by Robertson (1987), or by the presence of an inhibitory 
1 5 concentration of LIF, such as described by Pease and Williams ( 1 990). 

The constructs in the host cells can be used in a conventional manner to produce the gene 
product encoded by the recombinant sequence. 

Following transformation of a suitable host and growth of the host to an appropriate cell 
density, the selected promoter is induced by appropriate means, such as temperature shift or 
20 chemical induction, and cells are cultivated for an additional period. 

Cells are typically harvested by centrifiigation, disrupted by physical or chemical means, and 
the resulting crude extract retained for further purification. 

Microbial cells employed in the expression of proteins can be disrupted by any convenient 
method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 
25 agents. Such methods are well known by the skill artisan. 

Transgenic Animals 

The terms "transgenic animals" or "host animals" are used herein designate animals that 
have their genome genetically and artificially manipulated so as to include one of the nucleic acids 
according to the invention. Preferred animals are non-human mammals and include those belonging 

30 to a genus selected from Mm (e.g. mice), Rattus (e.g. rats) and Oryctogalus (e.g. rabbits) which have 
their genome artificially and genetically altered by the insertion of a nucleic acid according to the 
invention. In one embodiment, the invention encompasses non-human host mammals and animals 
comprising a recombinant vector of the invention or a HKLP gene disrupted by homologous 
recombination with a knock out vector. 

35 The transgenic animals of the invention all include within a plurality of their cells a cloned 

recombinant or synthetic DNA sequence, more specifically one of the purified or isolated nucleic 
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acids comprising a HKLP coding sequence, a HKLP regulatory polynucleotide, a polynucleotide 
construct, or a DNA sequence encoding an antisense polynucleotide such as described in the present 
specification. 

Generally, a transgenic animal according the present invention comprises any one of the 
5 polynucleotides, the recombinant vectors and the cell hosts described in the present invention. More 
particularly, the transgenic animals of the present invention can comprise any of the polynucleotides 
described in the "Genomic Sequences Of tThe HKLP Gene" section, the " HKLP cDNA Sequences" 
section, the "Coding Regions" section, the "Polynucleotide constructs" section, the "Oligonucleotide 
Probes And Primers" section, the "Recombinant Vectors" section and the "Cell Hosts" section. 
10 A further transgenic animals according to the invention contains in their somatic cells and/or 

in their germ line cells a polynucleotide comprising a biallelic marker selected from the group 
consisting of A 1 to A32, and the complements thereof. 

In a first preferred embodiment, these transgenic animals may be good experimental models 
in order to study the diverse pathologies related to cell differentiation, in particular concerning the 
1 5 transgenic animals within the genome of which has been inserted one or several copies of a 
polynucleotide encoding a native HKLP protein, or alternatively a mutant HKLP protein. 

In a second preferred embodiment, these transgenic animals may express a desired 
polypeptide of interest under the control of the regulatory polynucleotides of the HKLP gene, 
leading to good yields in the synthesis of this protein of interest, and eventually a tissue specific 
20 expression of this protein of interest. 

The design of the transgenic animals of the invention may be made according to the 
conventional techniques well known from the one skilled in the art. For more details regarding the 
production of transgenic animals, and specifically transgenic mice, it may be referred to US Patents 
Nos 4,873,191, issued Oct. 10, 1989; 5,464,764 issued Nov 7, 1995; and 5,789,215, issued Aug 4, 
25 1 998; these documents being herein incorporated by reference to disclose methods producing 
transgenic mice. 

Transgenic animals of the present invention are produced by the application of procedures 
which result in an animal with a genome that has incorporated exogenous genetic material. The 
procedure involves obtaining the genetic material, or a portion thereof, which encodes either a 

30 HKLP coding sequence, a HKLP regulatory polynucleotide or a DNA sequence encoding a HKLP 
antisense polynucleotide such as described in the present specification. 

A recombinant polynucleotide of the invention is inserted into an embryonic or ES stem cell 
line. The insertion is preferably made using electroporation, such as described by Thomas et 
al.(1987). The cells subjected to electroporation are screened (e.g. by selection via selectable 

35 markers, by PCR or by Southern blot analysis) to find positive cells which have integrated the 
exogenous recombinant polynucleotide into their genome, preferably via an homologous 
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recombination event. An illustrative positive-negative selection procedure that may be used 
according to the invention is described by Mansour et al.(1988). 

Then, the positive cells are isolated, cloned and injected into 3.5 days old blastocysts from 
mice, such as described by Bradley (1987). The blastocysts are then inserted into a female host 
5 animal and allowed to grow to term. 

Alternatively, the positive ES cells are brought into contact with embryos at the 2.5 days old 
8-16 cell stage (morulae) such as described by Wood et al.(1993) or by Nagy et al.(1993), the ES 
cells being internalized to colonize extensively the blastocyst including the cells which will give 
rise to the germ line. 

1 0 The offspring of the female host are tested to determine which animals are transgenic e.g. 

include the inserted exogenous DNA sequence and which are wild-type. 

Thus, the present invention also concerns a transgenic animal containing a nucleic acid, a 
recombinant expression vector or a recombinant host cell according to the invention. 

Recombinant Cell Lines Derived From The Transgenic Animals Of The Invention. 

15 A further object of the invention consists of recombinant host cells obtained from a 

transgenic animal described herein. In one embodiment the invention encompasses cells derived 
from non-human host mammals and animals comprising a recombinant vector of the invention or a 
HKLP gene disrupted by homologous recombination with a knock out vector. 

Recombinant cell lines may be established in vitro from cells obtained from any tissue of a 

20 transgenic animal according to the invention, for example by transfection of primary cell cultures 
with vectors expressing twc-genes such as SV40 large T antigen, as described by Chou (1989) and 
Shay et al.(1991). 

Methods for scre ening substances interacting with a HKLP polypeptide 
For the purpose of the present invention, a ligand means a molecule, such as a protein, a 
25 peptide, an antibody or any synthetic chemical compound capable of binding to the HKLP protein or 
one of its fragments or variants or to modulate the expression of the polynucleotide coding for 
HKLP or a fragment or variant thereof. 

In the ligand screening method according to the present invention, a biological sample or a 
defined molecule to be tested as a putative ligand of the HKLP protein is brought into contact with 
30 the corresponding purified HKLP protein, for example the corresponding purified recombinant 
HKLP protein produced by a recombinant cell host as described hereinbefore, in order to form a 
complex between this protein and the putative ligand molecule to be tested. 

As an illustrative example, to study the interaction of the HKLP protein, or a fragment 
comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more 
35 preferably at least 12, 15, 20, 25, 30, 40, 50, or 300 amino acids of SEQ ID No 4, wherein said 

contiguous span includes at least 1, 2, 3, 5 or 10 of the amino acid positions 1-478 of the SEQ ID No 
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4, with drugs or small molecules, such as molecules generated through combinatorial chemistry 
approaches, the microdialysis coupled to HPLC method described by Wang et al. (1997) or the 
affinity capillary electrophoresis method described by Bush et al. (1997), the disclosures of which 
are incorporated by reference, can be used. 
5 In further methods, peptides, drugs, fatty acids, lipoproteins, or small molecules which 

interact with the HKLP protein, or a fragment comprising a contiguous span of at least 6 amino 
acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 
100 amino acids of SEQ ID No 4, wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of 
the amino acid positions 1-478 of the SEQ ID No 4 may be identified using assays such as the 

10 following. The molecule to be tested for binding is labeled with a detectable label, such as a 

fluorescent , radioactive, or enzymatic tag and placed in contact with immobilized HKLP protein, or 
a fragment thereof under conditions which permit specific binding to occur. After removal of non- 
specifically bound molecules, bound molecules are detected using appropriate means. 

Another object of the present invention consists of methods and kits for the screening of 

1 5 candidate substances that interact with HKLP polypeptide. 

The present invention pertains to methods for screening substances of interest that interact 
with a HKLP protein or one fragment or variant thereof. By their capacity to bind covalently or 
non-covalently to a HKLP protein or to a fragment or variant thereof, these substances or molecules 
may be advantageously used both in vitro and in vivo. 

20 In vitro, said interacting molecules may be used as detection means in order to identify the 

presence of a HKLP protein in a sample, preferably a biological sample. 

A method for the screening of a candidate substance comprises the following steps : 

a) providing a polypeptide consisting of a HKLP protein or a fragment comprising a 
contiguous span of at least 6 amino acids, preferably at least 8 to 1 0 amino acids, more preferably at 

25 least 1 2, 1 5, 20, 25, 30, 40, 50, or 1 00 amino acids of SEQ ID No 4, wherein said contiguous span 
includes at least 1, 2, 3, 5 or 10 of the amino acid positions 1-478 of the SEQ ID No 4 or a variant 
thereof; 

b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; 

30 d) detecting the complexes formed between said polypeptide and said candidate substance. 

The invention further concerns a kit for the screening of a candidate substance interacting 
with the HKLP polypeptide, wherein said kit comprises : 

a) a HKLP protein having an amino acid sequence selected from the group consisting of the 
amino acid sequences of SEQ ID No 4 or a peptide fragment comprising a contiguous span of at 
35 least 6 amino acids, preferably at least 8 to 1 0 amino acids, more preferably at least 1 2, 1 5, 20, 25, 
30, 40, 50, or 100 amino acids of SEQ ID No 4, wherein said contiguous span includes at least 1, 2, 
3, 5 or 1 0 of the amino acid positions 1 -478 of the SEQ ID No 4 or a variant thereof ; 
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b) optionally means useful to detect the complex formed between the HKLP protein or a 
peptide fragment or a variant thereof and the candidate substance. 

In a preferred embodiment of the kit described above, the detection means consist in 
monoclonal or polyclonal antibodies directed against the HKLP protein or a peptide fragment or a 
5 variant thereof. 

Various candidate substances or molecules can be assayed for interaction with a HKLP 
polypeptide. These substances or molecules include, without being limited to, natural or synthetic 
organic compounds or molecules of biological origin such as polypeptides. When the candidate 
substance or molecule consists of a polypeptide, this polypeptide may be the resulting expression 

1 0 product of a phage clone belonging to a phage-based random peptide library, or attemati vely the 
polypeptide may be the resulting expression product of a cDNA library cloned in a vector suitable 
for performing a two-hybrid screening assay. 

The invention also pertains to kits useful for performing the hereinbefore described 
screening method. Preferably, such kits comprise a HKLP polypeptide or a fragment or a variant 

1 5 thereof, and optionally means useful to detect the complex formed between the HKLP polypeptide 
or its fragment or variant and the candidate substance. In a preferred embodiment the detection 
means consist in monoclonal or polyclonal antibodies directed against the corresponding HKLP 
polypeptide or a fragment or a variant thereof. 

A. Candidate ligands obtained from random peptide libraries 

20 In a particular embodiment of the screening method, the putative ligand is the expression 

product of a DNA insert contained in a phage vector (Parmley and Smith, 1988). Specifically, 
random peptide phages libraries are used. The random DNA inserts encode for peptides of 8 to 20 
amino acids in length (Oldenburg K.R. et al., 1992; Valadon P., et al., 1996; Lucas A.H., 1994; 
Westerink M.A.J., 1995; Felici F. et al., 1991). According to this particular embodiment, the 

25 recombinant phages expressing a protein that binds to the immobilized HKLP protein is retained and 
the complex formed between the HKLP protein and the recombinant phage may be subsequently 
immunoprecipitated by a polyclonal or a monoclonal antibody directed against the HKLP protein. 

Once the ligand library in recombinant phages has been constructed, the phage population is 
brought into contact with the immobilized HKLP protein. Then the preparation of complexes is 

30 washed in order to remove the non-specifically bound recombinant phages. The phages that bind 
specifically to the HKLP protein are then eluted by a buffer (acid pH) or immunoprecipitated by the 
monoclonal antibody produced by the hybridoma anti-HKLP, and this phage population is 
subsequently amplified by an over-infection of bacteria (for example E. coli). The selection step 
may be repeated several times, preferably 2-4 times, in order to select the more specific recombinant 

35 phage clones. The last step consists in characterizing the peptide produced by the selected 
recombinant phage clones either by expression in infected bacteria and isolation, expressing the 
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phage insert in another host-vector system, or sequencing the insert contained in the selected 
recombinant phages. 

B. Candidate ligands obtained by c mpetiti n experiments. 
Alternatively, peptides, drugs or small molecules which bind to the HKLP protein, or a 
5 fragment comprising a contiguous span of at least 6 amino acids, preferably at least 8 to 1 0 amino 
acids, more preferably at least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 4, 
wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the amino acid positions 1-478 of 
the SEQ ID No 4, may be identified in competition experiments. In such assays, the HKLP protein, 
or a fragment thereof, is immobilized to a surface, such as a plastic plate. Increasing amounts of the 

1 0 peptides, drugs or small molecules are placed in contact with the immobilized HKLP protein, or a 
fragment thereof, in the presence of a detectable labeled known HKLP protein ligand. For example, 
the HKLP ligand may be detectably labeled with a fluorescent, radioactive, or enzymatic tag. The 
ability of the test molecule to bind the HKLP protein, or a fragment thereof, is determined by 
measuring the amount of detectably labeled known ligand bound in the presence of the test 

1 5 molecule. A decrease in the amount of known ligand bound to the HKLP protein, or a fragment 
thereof, when the test molecule is present indicated that the test molecule is able to bind to the 
HKLP protein, or a fragment thereof. 

C Candidate ligands obtained by affinity chromatography. 

Proteins or other molecules interacting with the HKLP protein, or a fragment comprising a 
20 contiguous span of at least 6 amino acids, preferably at least 8 to 1 0 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 4, wherein said contiguous span 
includes at least 1, 2, 3, 5 or 10 of the amino acid positions 1-478 of the SEQ ID No 4, can also be 
found using affinity columns which contain the HKLP protein, or a fragment thereof. The HKLP 
protein, or a fragment thereof, may be attached to the column using conventional techniques 
25 including chemical coupling to a suitable column matrix such as agarose, Afifi Gei® , or other 
matrices familiar to those of skill in art. In some embodiments of this method, the affinity column 
contains chimeric proteins in which the HKLP protein, or a fragment thereof, is fused to glutathion S 
transferase (GST). A mixture of cellular proteins or pool of expressed proteins as described above is 
applied to the affinity column. Proteins or other molecules interacting with the HKLP protein, or a 
30 fragment thereof, attached to the column can then be isolated and analyzed on 2-D electrophoresis 
gel as described in Ramunsen et al. (1997), the disclosure of which is incorporated by reference. 
Alternatively, the proteins retained on the affinity column can be purified by electrophoresis based 
methods and sequenced. The same method can be used to isolate antibodies, to screen phage display 
products, or to screen phage display human antibodies. 
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D. Candidate ligands obtained by optical biosens r meth ds 

Proteins interacting with the HKLP protein, or a fragment comprising a contiguous span of 
at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at least 12, 15, 20, 25, 
30, 40, 50, or 100 amino acids of SEQ ID No 4, wherein said contiguous span includes at least 1, 2, 
5 3, 5 or 1 0 of the amino acid positions 1-478 of the SEQ ID No 4, can also be screened by using an 
Optical Biosensor as described in Edwards and Leatherbarrow (1997) and also in Szabo et al. 
(1995), the disclosure of which is incorporated by reference. This technique permits the detection of 
interactions between molecules in real time, without the need of labeled molecules. This technique 
is based on the surface plasmon resonance (SPR) phenomenon. Briefly, the candidate ligand 

1 0 molecule to be tested is attached to a surface (such as a carboxymethyl dextran matrix). A light 
beam is directed towards the side of the surface that does not contain the sample to be tested and is 
reflected by said surface. The SPR phenomenon causes a decrease in the intensity of the reflected 
light with a specific association of angle and wavelength. The binding of candidate ligand 
molecules cause a change in the refraction index on the surface, which change is detected as a 

1 5 change in the SPR signal. For screening of candidate ligand molecules or substances that are able to 
interact with the HKLP protein, or a fragment thereof, the HKLP protein, or a fragment thereof, is 
immobilized onto a surface. This surface consists of one side of a cell through which flows the 
candidate molecule to be assayed. The binding of the candidate molecule on the HKLP protein, or a 
fragment thereof, is detected as a change of the SPR signal. The candidate molecules tested may be 

20 proteins, peptides, carbohydrates, lipids, or small molecules generated by combinatorial chemistry. 
This technique may also be performed by immobilizing eukaryotic or prokaiyotic cells or lipid 
vesicles exhibiting an endogenous or a recombinantly expressed HKLP protein at their surface. 

The main advantage of the method is that it allows the determination of the association rate 
between the HKLP protein and molecules interacting with the HKLP protein. It is thus possible to 

25 select specifically ligand molecules interacting with the HKLP protein, or a fragment thereof, 
through strong or conversely weak association constants. 

E. Candidate ligands obtained through a two-hybrid screening assay. 

The yeast two-hybrid system is designed to study protein-protein interactions in vivo (Fields 
and Song, 1989), and relies upon the fusion of a bait protein to the DNA binding domain of the yeast 
30 Gal4 protein. This technique is also described in the US Patent N° US 5,667,973 and the US Patent 
N° 5,283,173 (Fields et al.) the technical teachings of both patents being herein incorporated by 
reference. 

The general procedure of library screening by the two-hybrid assay may be performed as 
described by Harper et al. (1993) or as described by Cho et al. (1998) or also Fromont-Racine et al. 
35 (1997). 
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The bait protein or polypeptide consists of a HKLP polypeptide or a fragment comprising a 
contiguous span of at least 6 amino acids, preferably at least 8 to 10 amino acids, more preferably at 
least 12, 15, 20, 25, 30, 40, 50, or 100 amino acids of SEQ ID No 4, wherein said contiguous span 
includes at least 1,2, 3, 5 or 10 of the amino acid positions 1-478 of the SEQ ID No 4 or a variant 
5 thereof. 

More precisely, the nucleotide sequence encoding the HKLP polypeptide or a fragment or 
variant thereof is fused to a polynucleotide encoding the DNA binding domain of the GAL4 protein, 
the fused nucleotide sequence being inserted in a suitable expression vector, for example pAS2 or 
pM3. 

1 0 Then, a human cDNA library is constructed in a specially designed vector, such that the 

human cDNA insert is fused to a nucleotide sequence in the vector that encodes the transcriptional 
domain of the GAL4 protein. Preferably, the vector used is the pACT vector. The polypeptides 
encoded by the nucleotide inserts of the human cDNA library are termed "pray" polypeptides. 

A third vector contains a detectable marker gene, such as beta galactosidase gene or CAT 

15 gene that is placed under the control of a regulation sequence that is responsive to the binding of a 
complete Gal4 protein containing both the transcriptional activation domain and the DNA binding 
domain. For example, the vector pGSEC may be used. 

Two different yeast strains are also used. As an illustrative but non limiting example the 
two different yeast strains may be the followings : 

20 - Y 1 90, the phenotype of which is (MATa, Leu2-3, 112 wa3-12, trpl-901, his3-D200, ade2-10 1, 
gal4Dgall80D URA3 GAL-LacZ, LYS GAL-HIS3, cyhj, 
- Y 1 87, the phenotype of which is (MATa gal4 gal80 his3 trpl-901 ade2-101 ura3-S2 leu2-3, - 
112 URA3 GAL-lacZmef), which is the opposite mating type of Y190. 

Briefly, 20 ug of pAS2/HKLP and 20 ug of pACT-cDNA library are co-transformed into 

25 yeast strain Y190. The transformants are selected for growth on minimal media lacking histidine, 
leucine and tryptophan, but containing the histidine synthesis inhibitor 3-AT (50 mM). Positive 
colonies are screened for beta galactosidase by filter lift assay. The double positive colonies (His\ 
beta-gal ) are then grown on plates lacking histidine, leucine, but containing tryptophan and 
cycloheximide (10 mg/ml) to select for loss ofpAS2/HKLP plasmids bu retention of pACT-cDNA 

30 library plasmids. The resulting Y190 strains are mated with Y187 strains expressing HKLP or non- 
related control proteins; such as cyclophilin B, lamin, or SNF1 , as Gal4 fusions as described by 
Harper et al. (1993) and by Bram et al. (1993), and screened for beta galactosidase by filter lift 
assay. Yeast clones that are beta gal- after mating with the control Gal4 fusions are considered false 
positives. 

35 In another embodiment of the two-hybrid method according to the invention, interaction 

between the HKLP or a fragment or variant thereof with cellular proteins may be assessed using the 
Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, Ciontech). As described in the manual 
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accompanying the Matchmaker Two Hybrid System 2 (Catalog No. K1604-1, ClontechX the disclosure 
of which is incorporated herein by reference, nucleic acids encoding the HKLP protein or a portion 
thereof, are inserted into an expression vector such that they are in frame with DNA encoding the DNA 
binding domain of the yeast transcriptional activator GAM. A desired cDNA, preferably human 
5 cDNA, is inserted into a second expression vector such mat they are in frame with DNA encoding the 
activation domain of GAL4. The two expression plasmids are transformed into yeast and the yeast are 
plated on selection medium which selects for expression of selectable markers on each of the expression 
vectors as well as GAL4 dependent expression of the HIS3 gene. Transformants capable of growing on 
medium lacking histidine are screened for GAM dependent lacZ expression. Those cells which are 
1 0 positive in both the histidine selection and the lacZ assay contain interaction between HKLP and the 
protein or peptide encoded by the initially selected cDNA insert. 

Methods For Screening Substances Mod ulating The Activity Of The HKLP protein 
The invention also concerns a method for screening new agents, or candidate substances 
which modulate the activity of the HKLP protein or a fragment thereof. Preferably, the HKLP 

1 5 protein or a fragment thereof is a polypeptide code comprising a contiguous span of at least 6 amino 
acids of SEQ ID No 4, wherein said contiguous span includes at least 1 of the amino acid positions 
1-478 of the SEQ ID No 4. Preferably, the candidate substance is mixed with the HKLP protein and 
the activity of the HKLP protein is measured. Candidate substances include, without being limited 
to, natural or synthetic organic compounds or molecules of biological origin such as polypeptides. 

20 Various assays for biological activity of motor proteins are known (Sato-Yoshitake et al, 

1992 and Scholey, 1993). In vitro motility assays to characterize specific KLPs, for example, include 
microtubule gliding assays demonstrating translocation of microtubules, organelle movement assays 
to visualize the movement of a cargo of interest, and single molecule motility assays (Howard 1989, 
Block 1990) to characterize structural elements. 

25 In short, microtubule gliding assays can be performed by applying a sample containing the 

HKLP protein to a glass surface without any treatment and incubating with microtubules 
reconstituted from polymerized tubulin. Microtubule translocation activity and the direction of 
movement can be determined as in Nangaku 1994 by observing the movement of axonemes on the 
glass surface. Organelle movement assays can be performed by applying a composition containing 

30 the cargo of interest with a solution containing vesicles and the HKLP protein to a glass surface. 
Movement of the organelle can be observed, for example, by using a cargo-specific fluorescent 
probe to stain vesicles before incubation with the HKLP protein. 

Methods For I nhibiting The Expression Of A HKLP Gen 
Other therapeutic compositions according to the present invention comprise advantageously 
35 an oligonucleotide fragment of the nucleic sequence of HKLP as an antisense tool or a triple helix 
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tool that inhibits the expression of the corresponding HKLP gene. A preferred fragment of the 
nucleic sequence of HKLP comprises an allele of at least one of the biallelic markers Al to A32. 

Antisense Approach 

Preferred methods using antisense polynucleotide according to the present invention are the 
5 procedures described by Sczakiel et al.( 1 995). 

Preferably, the antisense tools are chosen among the polynucleotides (15-200 bp long) that 
are complementary to the 5'end of the HKLP mRNA. In another embodiment, a combination of 
different antisense polynucleotides complementary to different parts of the desired targeted gene are 
used. 

10 Preferred antisense polynucleotides according to the present invention are complementary to 

a sequence of the mRNAs of HKLP that contains either the translation initiation codon ATG or a 
splicing donor or acceptor site. 

The antisense nucleic acids should have a length and melting temperature sufficient to 
permit formation of an intracellular duplex having sufficient stability to inhibit the expression of the 
15 HKLP mRNA in the duplex. Strategies for designing antisense nucleic acids suitable for use in gene 
therapy are disclosed in Green et al., (1986) and Izant and Weintraub, (1984), the disclosures of 
which are incorporated herein by reference. 

In some strategies, antisense molecules are obtained by reversing the orientation of the 
HKLP coding region with respect to a promoter so as to transcribe the opposite strand from that 
20 which is normally transcribed in the cell. The antisense molecules may be transcribed using in vitro 
transcription systems such as those which employ T7 or SP6 polymerase to generate the transcript. 
Another approach involves transcription of HKLP antisense nucleic acids in vivo by operably 
linking DNA containing the antisense sequence to a promoter in a suitable expression vector. 

Alternatively, suitable antisense strategies are those described by Rossi et al.(1991), in the 
25 International Applications Nos. WO 94/23026, WO 95/04141, WO 92/1 8522 and in the European 
Patent Application No. EP 0 572 287 A2 

An alternative to the antisense technology that is used according to the present invention 
consists in using ribozymes that will bind to a target sequence via their complementary 
polynucleotide tail and that will cleave the corresponding RNA by hydrolyzing its target site 
30 (namely "hammerhead ribozymes"). Briefly, the simplified cycle of a hammerhead ribozyme 
consists of (1) sequence specific binding to the target RNA via complementary antisense sequences; 
(2) site-specific hydrolysis of the cleavable motif of the target strand; and (3) release of cleavage 
products, which gives rise to another catalytic cycle. Indeed, the use of long-chain antisense 
polynucleotide (at least 30 bases long) or ribozymes with long antisense arms are advantageous. A 
35 preferred delivery system for antisense ribozyme is achieved by covalently linking these antisense 
ribozymes to lipophilic groups or to use liposomes as a convenient vector. Preferred antisense 
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ribozymes according to the present invention are prepared as described by Sczakiel et al.(1995), the 
specific preparation procedures being referred to in said article being herein incorporated by 
reference. 

Triple Helix Approach 

The HKLP genomic DNA may also be used to inhibit the expression of the HKLP gene 
based on intracellular triple helix formation. 

Triple helix oligonucleotides are used to inhibit transcription from a genome. They are 
particularly useful for studying alterations in cell activity when it is associated with a particular 



gene 

10 



Similarly, a portion of the HKLP genomic DNA can be used to study the effect of inhibiting 
HKLP transcription within a cell. Traditionally, homopurine sequences were considered the most 
useful for triple helix strategies. However, homopyrimidine sequences can also inhibit gene 
expression. Such homopyrimidine oligonucleotides bind to the major groove at 
homopurine:homo P yrimidine sequences. Thus, both types of sequences from the HKLP genomic 
1 5 DNA are contemplated within the scope of this invention. 

To carry out gene therapy strategies using the triple helix approach, the sequences of the 
HKLP genomic DNA are first scanned to identify 1 0-mer to 20-mer homopyrimidine or homopurine 
stretches which could be used in triple-helix based strategies for inhibiting HKLP expression. 
Following identification of candidate homopyrimidine or homopurine stretches, their efficiency in 
20 inhibiting HKLP expression is assessed by introducing varying amounts of oligonucleotides 
containing the candidate sequences into tissue culture cells which express the HKLP gene. 

The oligonucleotides can be introduced into the cells using a variety of methods known to 
those skilled in the art, including but not limited to calcium phosphate precipitation, DEAE-Dextran, 
electroporation, liposome-mediated transfection or native uptake. 
25 Treated cells are monitored for altered cell function or reduced HKLP expression using 

techniques such as Northern blotting, RNase protection assays, or PGR based strategies to monitor 
the transcription levels of the HKLP gene in cells which have been treated with the oligonucleotide. 

The oligonucleotides which are effective in inhibiting gene expression in tissue culture cells 
may then be introduced in vivo using the techniques described above in the antisense approach at a 
30 dosage calculated based on the in vitro results, as described in antisense approach. 

In some embodiments, the natural (beta) anomers of the oligonucleotide units can be 
replaced with alpha anomers to render the oligonucleotide more resistant to nucleases. Further, an 
intercalating agent such as ethidium bromide, or the like, can be attached to the 3' end of the alpha 
oligonucleotide to stabilize the triple helix. For information on the generation of oligonucleotides 
35 suitable for triple helix formation see Griffin et al.(1989), which is hereby incorporated by this 
reference. 
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Computer-R lated Embodiments 
As used herein the teim "nucleic acid codes of the invention" encompass the nucleotide 
sequences comprising, consisting essentially of, or consisting of any one of the following: a) a 
contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 
5 1000 nucleotides of SEQ ID No 1, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 
of the following nucleotide positions of SEQ ID No 1 : 1-39624, 39705-40589, 4066M3629, 4371 0- 
44203, 4431 1-45125, 45210-45440, 45622-45717, 45791-68580, 68675-70246, 70396-72421, 
72601-73295, 73434-74648, 74898-83055, 83175-85192, 85279-85609, 85740-85906, 86070- 
88304, 88396-90585, 90705-91767, 91824-94380, 94490-96296, 96364-97184, 97270-101 167, 
10 101274-109465, 109581-110228, 110363-111819, 111882-113636, 113783-113945, 114186- ' 
117002, 117075-1 19676, and 119677-121162; b) a contiguous span of at least 12, 15, 18,20,25, 
30, 35, 40, 50, 60, 70, 80, 90, 1 00, 1 50, 200, 500, or 1 000 nucleotides of SEQ ID No 2 or the 
complements thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the 
following nucleotide positions of SEQ ID No 2: 1-1600, 1 751-2138, 2332-2539, 2659-3829 and 
15 8885-10884; c) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 
1 SO, 200, 500, or 1 000 nucleotides of SEQ ID No 3 or the complements thereof, wherein said 
contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID 
No 3: 391-1619 and 6988-10682; d) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 
60,70,80,90, 100, 1 50, 200, 500, or 1000 nucleotides of SEQ ID No 1, 2 or 3, or the complements 

20 thereof, wherein said contiguous span comprises at least 1 , 2, 3, 5, or 1 0 nucleotide positions of any 
one of the following ranges of nucleotide positions of: (1) SEQ ID No 1: 1-1000, 1001-2000, 2001- 
3000, 3001-4000, 4001-5000, 5001-6000, 6001-7000, 7001-8000, 8001-9000, 9001-10000, 10001- 
11000, 11001-12000, 12001-13000, 13001-14000, 14001-15000, 15001-16000, 16001-17000, 
17001-18000, 18001-19000, 19001-20000, 20001-21000, 21001-22000, 22001-23000, 23001- 

25 24000, 24001-25000, 25001-26000, 26001-27000, 27001-28000, 28001-29000, 29001-30000, 
30001-31000, 31001-32000, 32001-33000, 33001-34000, 34001-35000, 35001-36000, 36001- 
37000, 37001-38000, 38001-39000, 39001-39624, 39705-40589, 40666-43629, 43710-44203, 
4431 1-45125, 45210-45440, 45622-45717, 45791-68580, 68675-70246, 70396-72421, 72601- 
73295, 73434-74648, 74898-83055, 83175-85192, 85279-85609, 85740-85906, 86070-88304, 

30 88396-90585, 90705-91767, 91824-94380, 94490-96296, 96364-97184, 97270-101 167, 101274- 
109465, 109581-110228, 110363-111819, 1 1 1882-1 13636, 1 13783-1 13945, 114186-117002, 
117075-1 19676, and 1 19677-121 162; and (2) SEQ ID No 2: 1-1600, 1751-2138,2332-2539,2659- 
3829 and 8885-10884; e) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 
90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 1 or the complements thereof, wherein 

35 said contiguous span comprises a Gat position 71 59 ofSEQ ID No 1; f) a contiguous span of at 
least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of 
SEQ ID No 2 or the complements thereof, wherein said contiguous span comprises a C either at 
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position 255 1 or 4500 of SEQ ID No 2; g) a contiguous span of at least 12, 1 5, 1 8, 20, 25, 30, 35, 
40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 3 or the complements 
thereof, wherein said contiguous span comprises a nucleotide selected in the group consisting of a C 
at position 5487, and a C at position 6265 of SEQ ID No 3; and, j) a nucleotide sequence 
5 complementary to any one of the preceding nucleotide sequences. 

The "nucleic acid codes of the invention" further encompass nucleotide sequences 
homologous to: a) a contiguous span of at least 12, 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 
150, 200, 500, or 1000 nucleotides of SEQ ID No 1, wherein said contiguous span comprises at least 
1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1 : 1-39624, 39705-40589, 
10 40666-43629, 437KM4203, 4431 1-45125, 45210-45440, 45622-45717, 45791-68580, 68675- 
70246, 70396-72421, 72601-73295, 73434-74648, 74898-83055, 83175-85192, 85279-85609, 
85740-85906, 86070-88304, 88396-90585, 90705-91767, 91824-94380, 94490-96296, 96364- 
97184,97270-101167, 101274-109465, 109581-110228, 1 10363-111819, 1 1 1 882-1 13636, 1 13783- 
113945, 114186-117002, 11 7075- 11 9676, and 119677-121162; b) a contiguous span of at least 12, 

15 15, 18, 20, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 500, or 1000 nucleotides of SEQ ID No 
2 or the complements thereof, wherein said contiguous span comprises at least 1 , 2, 3, 5, or 1 0 of the 
following nucleotide positions of SEQ ID No 2: 1 -1 600, 1 75 1-2138, 2332-2539, 2659-3829 and 
8885-10884 c) a contiguous span of at least 12, 15, 18,20, 25,30,35, 40,50, 60, 70, 80,90, 100, 
1 50, 200, 500, or 1000 nucleotides of SEQ ID No 3 or the complements thereof, wherein said 

20 contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID 
No 3: 391-1619 and 6988-10682; and d) sequences complementary to all of the preceding sequences. 
Homologous sequences refer to a sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 
80%, or 75% homology to these contiguous spans. Homology may be determined using any method 
described herein, including BLAST2N with the default parameters or with any modified parameters. 

25 Homologous sequences also may include RNA sequences in which uridines replace the thymines in the 
nucleic acid codes of the invention. It will be appreciated that the nucleic acid codes of the invention 
can be represented in the traditional single character format (See the inside back cover of Stryer, Lubert. 
Biochemistry, 3* edition. W. H Freeman & Co., New York.) or in any other format or code which 
records the identity of the nucleotides in a sequence. 

30 As used herein the term "polypeptide codes of the invention" encompass the polypeptide 

sequences comprising a contiguous span of at least 6, 8, 1 0, 1 2, 1 5, 20, 25, 30, 40, 50, or 1 00 amino 
acids of SEQ ID No 4, wherein said contiguous span includes at least 1, 2, 3, 5 or 10 of the amino 
acid positions 1-478 of the SEQ ID No 4. It will be appreciated that the polypeptide codes of the 
invention can be represented in the traditional single character format or three letter format (See the 

35 inside back cover of Stryer, Lubert. Biochemistry, 3 rf edition. W. H Freeman & Co., New York.) or in 
any other format or code which records the identity of the polypeptides in a sequence. 
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It will be appreciated by those skilled in the art that the nucleic acid codes of the invention and 
polypeptide codes of the invention can be stored, recorded, and manipulated on any medium which can 
be read and accessed by a computer. As used herein, the words "recorded" and "stored" refer to a 
process for storing information on a computer medium. A skilled artisan can readily adopt any of the 
5 presently known methods for recording information on a computer readable medium to generate 
manufactures comprising one or more of the nucleic acid codes of the invention, or one or more of the 
polypeptide codes of the invention. Another aspect of the present invention is a computer readable 
medium having recorded thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 nucleic acid codes of the 
invention. Another aspect of the present invention is a computer readable medium having recorded 

10 thereon at least 2, 5, 10, 15, 20, 25, 30, or 50 polypeptide codes of the invention. 

Computer readable media include magnetically readable media, optically readable media, 
electronically readable media and magnetic/optical media. For example, the computer readable media 
may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Random 
Access Memory (RAM), or Read Only Memory (ROM) as well as other types of other media known to 

1 5 those skilled in the art 

Embodiments of the present invention include systems, particularly computer systems which 
store and manipulate the sequence information described herein. One example of a computer system 
100 is illustrated in block diagram form in Figure 1 . As used herein, "a computer system" refers to the 
hardware components, software components, and data storage components used to analyze the 

20 nucleotide sequences of the nucleic acid codes of the invention or the amino acid sequences of the 
polypeptide codes of the invention. In one embodiment, the computer system 100 is a Sun Enterprise 
1 000 server (Sun Microsystems, Palo Alto, CA). The computer system 1 00 preferably includes a 
processor for processing, accessing and manipulating the sequence data. The processor 105 can be any 
well-known type of central processing unit, such as the Pentium III from Intel Corporation, or similar 

25 processor from Sun, Motorola, Compaq or International Business Machines. 

Preferably, the computer system 100 is a general purpose system that comprises the processor 
1 05 and one or more internal data storage components 1 1 0 for storing data, and one or more data 
retrieving devices for retrieving the data stored on the data storage components. A skilled artisan can 
readily appreciate that any one of the currently available computer systems are suitable. 

30 In one particular embodiment, the computer system 1 00 includes a processor 1 05 connected to 

a bus which is connected to a main memory 1 15 (preferably implemented as RAM) and one or more 
internal data storage devices 1 1 0, such as a hard drive and/or other computer readable media having 
data recorded thereon. In some embodiments, the computer system 1 00 further includes one or more 
data retrieving device 1 1 8 for reading the data stored on the internal data storage devices 1 1 0. 

35 The data retrieving device 1 1 8 may represent, for example, a floppy disk drive, a compact disk 

drive, a magnetic tape drive, etc. In some embodiments, the internal data storage device 1 1 0 is a 
removable computer readable medium such as a floppy disk, a compact disk, a magnetic tape, etc. 
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containing control logic and/or data recorded thereon. The computer system 100 may advantageously 
include or be programmed by appropriate software for reading the control logic and/or the data from the 
data storage component once inserted in the data retrieving device. 

The computer system 1 00 includes a display 1 20 which is used to display output to a computer 
5 user. It should also be noted that the computer system 1 00 can be linked to other computer systems 
125a-c in a network or wide area network to provide centralized access to the computer system 1 00. 

Software for accessing and processing the nucleotide sequences of the nucleic acid codes of the 
invention or the amino acid sequences of the polypeptide codes of the invention (such as search tools, 
compare tools, and modeling tools etc.) may reside in main memory 1 15 during execution. 

1 0 In some embodiments, the computer system 1 00 may further comprise a sequence comparer for 

comparing the above-described nucleic acid codes of the invention or the polypeptide codes of the 
invention stored on a computer readable medium to reference nucleotide or polypeptide sequences 
stored on a computer readable medium. A "sequence comparer" refers to one or more programs which 
are implemented on the computer system 1 00 to compare a nucleotide or polypeptide sequence with 

1 5 other nucleotide or polypeptide sequences and/or compounds including but not limited to peptides, 
peptidomimetics, and chemicals stored within the data storage means. For example, the sequence 
comparer may compare the nucleotide sequences of nucleic acid codes of the invention or the amino 
acid sequences of the polypeptide codes of the invention stored on a computer readable medium to 
reference sequences stored on a computer readable medium to identify homologies, motifs implicated in 

20 biological function, or structural motifs. The various sequence comparer programs identified elsewhere 
in this patent specification are particularly contemplated for use in this aspect of the invention. 

Figure 2 is a flow diagram illustrating one embodiment of a process 200 for comparing a new 
nucleotide or protein sequence with a database of sequences in order to determine the homology levels 
between the new sequence and the sequences in the database. The database of sequences can be a 

25 private database stored within the computer system 1 00, or a public database such as GENBANK, PIR 
OR SWISSPROT that is available through the Internet. 

The process 200 begins at a start state 201 and then moves to a state 202 wherein the new 
sequence to be compared is stored to a memory in a computer system 1 00. As discussed above, the 
memory could be any type of memory, including RAM or an internal storage device. 

30 The process 200 then moves to a state 204 wherein a database of sequences is opened for 

analysis and comparison. The process 200 then moves to a state 206 wherein the first sequence stored 
in the database is read into a memory on the computer. A comparison is then performed at a state 2 1 0 
to determine if the first sequence is the same as the second sequence. It is important to note that this 
step is not limited to performing an exact comparison between the new sequence and the first sequence 

35 in the database. Well-known methods are known to those of skill in the art for comparing two 

nucleotide or protein sequences, even if they are not identical. For example, gaps can be introduced into 
one sequence in order to raise the homology level between the two tested sequences. The parameters 
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that control whether gaps or other features are introduced into a sequence during comparison are 
normally entered by the user of the computer system. 

Once a comparison of the two sequences has been performed at the state 210, a determination is 
made at a decision state 2 1 0 whether the two sequences are the same. Of course, the term "same" is not 
5 limited to sequences that are absolutely identical. Sequences that are within the homology parameters 
entered by the user will be marked as "same* 9 in the process 200. 

If a determination is made that the two sequences are the same, the process 200 moves to a state 
2 1 4 wherein the name of the sequence from the database is displayed to the user. This state notifies the 
user that the sequence with the displayed name fulfills the homology constraints that were entered. 

1 0 Once the name of the stored sequence is displayed to the user, the process 200 moves to a decision state 
218 wherein a determination is made whether more sequences exist in the database. If no more 
sequences exist in the database, then the process 200 terminates at an end state 220. However, if more 
sequences do exist in the database, then the process 200 moves to a state 224 wherein a pointer is 
moved to the next sequence in the database so that it can be compared to the new sequence. In this 

1 5 manner, the new sequence is aligned and compared with every sequence in the database. 

It should be noted that if a determination had been made at the decision state 212 that the 
sequences were not homologous, then the process 200 would move immediately to the decision state 
21 8 in order to determine if any other sequences were available in the database for comparison. 
Accordingly, one aspect of the present invention is a computer system comprising a 

20 processor, a data storage device having stored thereon a nucleic acid code of the invention or a 
polypeptide code of the invention, a data storage device having retrievably stored thereon reference 
nucleotide sequences or polypeptide sequences to be compared to the nucleic acid code of the 
invention or polypeptide code of the invention and a sequence comparer for conducting the 
comparison. The sequence comparer may indicate a homology level between the sequences 

25 compared or identify structural motifs in the nucleic acid code of the invention and polypeptide 
codes of the invention or it may identify structural motifs in sequences which are compared to these 
nucleic acid codes and polypeptide codes. In some embodiments, the data storage device may have 
stored thereon the sequences of at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 
invention or polypeptide codes of the invention. 

30 Another aspect of the present invention is a method for determining the level of homology 

between a nucleic acid code of the invention and a reference nucleotide sequence, comprising the 
steps of reading the nucleic acid code and the reference nucleotide sequence through the use of a 
computer program which determines homology levels and determining homology between the nucleic 
acid code and the reference nucleotide sequence with the computer program. The computer program 

35 may be any of a number of computer programs for determining homology levels, including those 
specifically enumerated herein, including BLAST2N with the default parameters or with any modified 
parameters. The method may be implemented using the computer systems described above. The 
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method may also be performed by reading 2, 5, 10, 15, 20, 25, 30, or 50 of the above described nucleic 
acid codes f the invention through the use of the computer program and determining homology 
between the nucleic acid codes and reference nucleotide sequences. 

Figure 3 is a flow diagram illustrating one embodiment of a process 250 in a computer for 
5 determining whether two sequences are homologous. The process 250 begins at a start state 252 and 
then moves to a state 254 wherein a first sequence to be compared is stored to a memory. The 
second sequence to be compared is then stored to a memory at a state 256. The process 250 then 
moves to a state 260 wherein the first character in the first sequence is read and then to a state 262 
wherein the first character of the second sequence is read. It should be understood that if the 

1 0 sequence is a nucleotide sequence, then the character would normally be either A, T, C, G or U. If 
the sequence is a protein sequence, then it should be in the single letter amino acid code so that the 
first and sequence sequences can be easily compared. 

A determination is then made at a decision state 264 whether the two characters are the 
same. If they are the same, then the process 250 moves to a state 268 wherein the next characters in 

1 5 the first and second sequences are read. A determination is then made whether the next characters 
are the same. If they are, then the process 250 continues this loop until two characters are not the 
same. If a determination is made that the next two characters are not the same, the process 250 
moves to a decision state 274 to determine whether there are any more characters either sequence to 
read. 

20 If there aren't any more characters to read, then the process 250 moves to a state 276 

wherein the level of homology between the first and second sequences is displayed to the user. The 
level of homology is determined by calculating the proportion of characters between the sequences 
that were the same out of the total number of sequences in the first sequence. Thus, if every 
character in a first 100 nucleotide sequence aligned with a every character in a second sequence, the 

25 homology level would be 100%. 

Alternatively, the computer program may be a computer program which compares the 
nucleotide sequences of the nucleic acid codes of the present invention, to reference nucleotide 
sequences in order to determine whether the nucleic acid code of the invention differs from a reference 
nucleic acid sequence at one or more positions. Optionally such a program records the length and 

30 identity of inserted, deleted or substituted nucleotides with respect to the sequence of either the 

reference polynucleotide or the nucleic acid code of the invention. In one embodiment, the computer 
program may be a program which determines whether the nucleotide sequences of the nucleic acid 
codes of the invention contain one or more single nucleotide polymorphisms (SNP) with respect to a 
reference nucleotide sequence. These single nucleotide polymorphisms may each comprise a single 

35 base substitution, insertion, or deletion. 

Another aspect of the present invention is a method for determining the level of homology 
between a polypeptide code of the invention and a reference polypeptide sequence, comprising the 
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steps of reading the polypeptide code of the invention and the reference polypeptide sequence through 
use of a computer program which determines homology levels and determining homology between the 
polypeptide code and the reference polypeptide sequence using the computer program. 

Accordingly, another aspect of the present invention is a method for determining whether a 
5 nucleic acid code of the invention differs at one or more nucleotides from a reference nucleotide 
sequence comprising the steps of reading the nucleic acid code and the reference nucleotide sequence 
through use of a computer program which identifies differences between nucleic acid sequences and 
identifying differences between the nucleic acid code and the reference nucleotide sequence with the 
computer program. In some embodiments, the computer program is a program which identifies single 
1 0 nucleotide polymorphisms The method may be implemented by the computer systems described above 
and the method illustrated in Figure 3. The method may also be performed by reading at least 2, 5, 10, 
15, 20, 25, 30, or 50 of the nucleic acid codes of the invention and the reference nucleotide sequences' 
through the use of the computer program and identifying differences between the nucleic acid codes and 
the reference nucleotide sequences with the computer program. 
1 5 In other embodiments the computer based system may further comprise an identifier for 

identifying features within the nucleotide sequences of the nucleic acid codes of the invention or the 
amino acid sequences of the polypeptide codes of the invention. 

An "identifier" refers to one or more programs which identifies certain features within the 
above-described nucleotide sequences of the nucleic acid codes of the invention or the amino acid 
20 sequences of the polypeptide codes of the invention. In one embodiment, the identifier may 
comprise a program which identifies an open reading frame in the cDNAs codes of the invention. 

Figure 4 is a flow diagram illustrating one embodiment of an identifier process 300 for 
detecting the presence of a feature in a sequence. The process 300 begins at a start state 302 and 
then moves to a state 304 wherein a first sequence that is to be checked for features is stored to a 
25 memory 1 15 in the computer system 100. The process 300 then moves to a state 306 wherein a 
database of sequence features is opened. Such a database would include a list of each feature's 
attributes along with the name of the feature. For example, a feature name could be "Initiation 
Codon" and the attribute would be "ATG". Another example would be the feature name "TAATAA 
Box" and the feature attribute would be "TAATAA". An example of such a database is produced by 
30 the University of Wisconsin Genetics Computer Group (www.gcg.com). 

Once the database of features is opened at the state 306, the process 300 moves to a state 
308 wherein the first feature is read from the database. A comparison of the attribute of the first 
feature with the First sequence is then made at a state 3 1 0. A determination is then made at a 
decision state 3 1 6 whether the attribute of the feature was found in the first sequence. If the attribute 
35 was found, then the process 300 moves to a state 318 wherein the name of the found feature is 
displayed to the user. 
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The process 300 then moves to a decision state 320 wherein a determination is made 
whether move features exist in the database. If no more features do exist, then the process 300 
terminates at an end state 324. However, if more features do exist in the database, then the process 
300 reads the next sequence feature at a state 326 and loops back to the state 310 wherein the 
5 attribute of the next feature is compared against the first sequence. 

It should be noted, that if the feature attribute is not found in the first sequence at the 
decision state 3 16, the process 300 moves directly to the decision state 320 in order to determine if 
any more features exist in the database. 

In another embodiment, the identifier may comprise a molecular modeling program which 

10 determines the 3-dimensional structure of the polypeptides codes of the invention. In some 

embodiments, the molecular modeling program identifies target sequences that are most compatible 
with profiles representing the structural environments of the residues in known three-dimensional 
protein structures. (See, e.g., Eisenberg et al., U.S. Patent No. 5,436,850 issued July 25, 1995). In 
another technique, the known three-dimensional structures of proteins in a given family are 

1 5 superimposed to define the structurally conserved regions in that family. This protein modeling 
technique also uses the known three-dimensional structure of a homologous protein to approximate 
the structure of the polypeptide codes of the invention. (See e.g., Srinivasan, et al., U.S. Patent 
No. 5,557,535 issued September 17, 1996). Conventional homology modeling techniques have been 
used routinely to build models of proteases and antibodies. (Sowdhamini et al., 1997). Comparative 

20 approaches can also be used to develop three-dimensional protein models when the protein of 
interest has poor sequence identity to template proteins. In some cases, proteins fold into similar 
three-dimensional structures despite having very weak sequence identities. For example, the three- 
dimensional structures of a number of helical cytokines fold in similar three-dimensional topology in 
spite of weak sequence homology. 

25 The recent development of threading methods now enables the identification of likely 

folding patterns in a number of situations where the structural relatedness between target and 
template(s) is not detectable at the sequence level. Hybrid methods, in which fold recognition is 
performed using Multiple Sequence Threading (MST), structural equivalencies are deduced from the 
threading output using a distance geometry program DRAGON to construct a low resolution model, 

30 and a full-atom representation is constructed using a molecular modeling package such as 
QUANTA. 

According to this 3-step approach, candidate templates are first identified by using the novel 
fold recognition algorithm MST, which is capable of performing simultaneous threading of multiple 
aligned sequences onto one or more 3-D structures. In a second step, the structural equivalencies 
35 obtained from the MST output are converted into interresidue distance restraints and fed into the 
distance geometry program DRAGON, together with auxiliary information obtained from secondary 
structure predictions. The program combines the restraints in an unbiased manner and rapidly 
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generates a large number of low resolution model confirmations. In a third step, these low 
resolution model confirmations are converted into full-atom models and subjected to energy 
minimization using the molecular modeling package QUANTA. (See e.g., Aszodi et al., 1997). 
The results of the molecular modeling analysis may then be used in rational drug design 
5 techniques to identify agents which modulate the activity of the polypeptide codes of the invention. 
Accordingly, another aspect of the present invention is a method of identifying a feature 
within the nucleic acid codes of the invention or the polypeptide codes of the invention comprising 
reading the nucleic acid code(s) or the polypeptide code(s) through the use of a computer program 
which identifies features therein and identifying features within the nucleic acid code(s) or 

10 polypeptide code(s) with the computer program. In one embodiment, computer program comprises a 
computer program which identifies open reading frames. In a further embodiment, the computer 
program identifies structural motifs in a polypeptide sequence. In another embodiment, the 
computer program comprises a molecular modeling program. The method may be performed by 
reading a single sequence or at least 2, 5, 10, 15, 20, 25, 30, or 50 of the nucleic acid codes of the 

1 5 invention or the polypeptide codes of the invention through the use of the computer program and 
identifying features within the nucleic acid codes or polypeptide codes with the computer program. 

The nucleic acid codes of the invention or the polypeptide codes of the invention may be 
stored and manipulated in a variety of data processor programs in a variety of formats. For example, 
they may be stored as text in a word processing file, such as MicrosoftWORD or WORDPERFECT or 

20 as an ASCII file in a variety of database programs familiar to those of skill in the art, such as DB2, 
SYBASE, or ORACLE. In addition, many computer programs and databases may be used as sequence 
comparers, identifiers, or sources of reference nucleotide or polypeptide sequences to be compared to 
the nucleic acid codes of the invention or the polypeptide codes of the invention. The following list is 
intended not to limit the invention but to provide guidance to programs and databases which are useful 

25 with the nucleic acid codes of the invention or the polypeptide codes of the invention. The programs 
and databases which may be used include, but are not limited to: MacPattern (EMBL), DiscoveryBase 
(Molecular Applications Group), GeneMine (Molecular Applications Group), Look (Molecular 
Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), 
BLASTN and BLASTX (Ahschul et al, 1990), FASTA (Pearson and Lipman, 1988), FASTDB 

30 (Brutlag et al., 1 990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations 
Inc.), Cerius 2 .DBAccess (Molecular Simulations Inc.), HypoGen (Molecular Simulations Inc.), Insight 
II, (Molecular Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular 
Simulations Inc.), Felix (Molecular Simulations Inc.), DelPhi, (Molecular Simulations Inc.), 
QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler 

35 (Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular 
Simulations Inc.), WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular 
Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), 
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the EMBL/Swissprotein database, the MDL Available Chemicals Directory database, the MDL Drug 
Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World Drug 
Index database, the BioByteMasterFile database, the Genbank database, and the Genseqn database. 
Many other programs and data bases would be apparent to one of skill in the art given the present 
5 disclosure. 

Motifs which may be detected using the above programs include sequences encoding 
leucine zippers, helix-turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and 
beta sheets, signal sequences encoding signal peptides which direct the secretion of the encoded 
proteins, sequences implicated in transcription regulation such as homeoboxes, acidic stretches, 
10 enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. 

Throughout this application, various publications, patents and published patent applications 
are cited. The disclosures of these publications, patents and published patent specification 
referenced in this application are hereby incorporated by reference into the present disclosure to 
1 5 more fully describe the sate of the art to which this invention pertains. 

EXAMPLES 
Example 1 

Identification Of Biallelic Markers - DNA Extraction 

Donors were unrelated and healthy. They presented a sufficient diversity for being 
20 representative of a French heterogeneous population. The DNA from 100 individuals was extracted 
and tested for the detection of the biallelic markers. 

30 ml of peripheral venous blood were taken from each donor in the presence of EDTA. 
Cells (pellet) were collected after centrifiigation for 10 minutes at 2000 rpm. Red cells were lysed 
by a lysis solution (50 ml final volume: 10 mM Tris pH7.6; 5 mM MgCI 2 ; 10 mM NaCI). The 
25 solution was centrifuged (10 minutes, 2000 rpm) as many times as necessary to eliminate the 
residual red cells present in the supernatant, after resuspension of the pellet in the lysis solution. 

The pellet of white cells was lysed overnight at 42°C with 3.7 ml of lysis solution composed 

of: 

- 3 ml TE 10-2 (Tris-HCl 10 mM, EDTA 2 mM) / NaCI 0 4 M 
30 -200jxlSDS 10% 

- 500 \xl K-proteinase (2 mg K-proteinase in TE 10-2 / NaCI 0.4 M). 

For the extraction of proteins, 1 ml saturated NaCI (6M) (1/3.5 v/v) was added. After 
vigorous agitation, the solution was centrifuged for 20 minutes at 10000 rpm. 
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For the precipitation of DNA, 2 to 3 volumes of 100% ethanol were added to the previous 
supernatant, and the solution was centrifuged for 30 minutes at 2000 rpm. The DNA solution was 
rinsed three times with 70% ethanol to eliminate salts, and centrifiiged for 20 minutes at 2000 rpm. 
The pellet was dried at 37°C, and resuspended in 1 ml TE 10-1 or 1 ml water. The DNA 
5 concentration was evaluated by measuring the OD at 260 nm (1 unit OD = 50 ng/ml DNA). 

To determine the presence of proteins in the DNA solution, the OD 260 / OD 280 ratio was 
determined. Only DNA preparations having a OD 260 / OD 280 ratio between 1 .8 and 2 were used 
in the subsequent examples described below. 

The pool was constituted by mixing equivalent quantities of DNA from each individual. 

10 Example 2 

Identification Of Biallelic Markers: Amplification Of Genomic DNA Bv PCR 
The amplification of specific genomic sequences of the DNA samples of example 1 was 
carried out on the pool of DNA obtained previously. In addition, 50 individual samples were 
similarly amplified. 
1 5 PCR assays were performed using the following protocol: 



Final volume 


25 ul 


DNA 


2 ng/uJ 


MgCb 


2mM 


dNTP(each) 


200 uM 


primer (each) 


2.9 ng/ul 


Ampli Taq Gold DNA polymerase 


0.05 unit/ul 


PCR buffer (1 Ox = 0.1 M TrisHCI pH8.3 0.5M KC1) 


lx 



Each pair of first primers was designed using the sequence information of the HKLP gene 
25 disclosed herein and the OSP software (Hillier & Green, 1 99 1 ). This first pair of primers was about 
20 nucleotides in length and had the sequences disclosed in Table 1 in the columns labeled PU and 
RP. 

Preferably, the primers contained a common oligonucleotide tail upstream of the specific 
bases targeted for amplification which was useful for sequencing. 
30 Primers PU contain the following additional PU 5' sequence: 

TGTAAAACGACGGCCAGT; primers RP contain the following RP 5' sequence: 
CAGGAAACAGCTATGACC The primer containing the additional PU 5' sequence is listed in 
SEQ ID No 9. The primer containing the additional RP 5' sequence is listed in SEQ ID No 10. 

The synthesis of these primers was performed following the phosphoramidite method, on a 
35 GENSET UFPS 24. 1 synthesizer. 
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DNA amplification was performed on a Genius II thermocycler. After heating at 95°C for 
10 min, 40 cycles were performed. Each cycle comprised: 30 sec at 95°C, 54°C for 1 min, and 30 
sec at 72°C. For final elongation, 10 min at 72°C ended the amplification. The quantities of the 
amplification products obtained were determined on 96-well microtiter plates, using a fluorometer 
S and Picogreen as intercalant agent (Molecular Probes). 



Table 1 



Amplicon 


Position range 
of the amplicon 
inSEQIDl 


PU 
Primer 
name 


Position range of 
amplification primer 
in SEQ ID No 1 


RP 
Primer 
name 


Complementary 
position range of 
amplification primer 
in SEQ ID No 1 




12-809 


7041 


7565 


Bl 


7041 


7060 


CI 


7545 


7565 


12-805 


16004 


16483 


B2 


16004 


16024 


C2 


16464 


16483 


12-790 


26232 


26705 


B3 


26232 


26252 


C3 


26685 


26705 


12-791 


30902 


31301 


B4 


30902 


30922 


C4 


31282 


31301 


12-803 


33476 


33932 


B5 


33476 


33496 


C5 


33912 


33932 


99-33040 


33934 


34383 


B6 


33934 


33953 


C6 


34364 


34383 


12-810 


34918 


35369 


B7 


34918 


34938 


C7 


35350 


35369 


12-787 


45465 


45993 


B8 


45465 


45485 


C8 


45974 


45993 


12-793 


54361 


54879 


B9 


54361 


54381 


C9 


54861 


54879 


12-792 


56365 


56813 


B10 


56365 


56385 


C10 


56793 


56813 


99-41009 


60111 


60580 


B11 


60111 


60130 


Cll 


60562 


60580 


12-593 


78815 


79349 


B12 


78815 


78835 


C12 


79332 


79349 


12-589 


79706 


80167 


B13 


79706 


79726 


C13 


80149 


80167 


12-785 


87264 


87773 


B14 


87264 


87284 




87753 


87773 


12-588 


101161 


101704 


B15 


101161 


101178 


C15 


101686 


101704 


12-603 


104131 


104578 


B16 


104131 


104151 




104558 


104578 


12-586 


117017 


117501 


B17 


117017 


117036 


C17 


117481 


117501 


AmDlicon 


Position range 
of the amplicon 
in SEQ ID 2 


PU 
Primer 
name 


Position range of 
amplification primer 
in SEQ ID No 2 


xvjt 

Primer 
name 


Complementary 
position range of 
amplification primer 
in SEQ ID No 2 




12-602 


2203 


2620 


B18 


2203 


2221 


C18 


2600 


2620 


12-587 


4479 


4878 


B19 


4479 


4499 


CI9 


4858 


4878 


12-596 


5996 


6443 


B20 


5996 


6015 


C20 


6423 


6443 


12-808 


10079 


10543 


B21 


10079 


10098 


C21 


10523 


10543 


Amplicon 


Position range 
of the amplicon 


Primer 
name 


Position range of 
amplification primer 


Primer 
name 


Complementary 
position range of 
amplification primer 


10-265 


1 | 357 


B22 


1 | 18 


C22 


338 | 357 


SEQ ID No 5 


SEQ ID No 5 


SEQ ID No 5 


10-266 


1 | 420 


B23 


1 | 20 


C23 


401 | 420 


SEQ ID No 6 




SEQ ID No 6 


SEQ ID No 6 


12-592 


1 | 465 


B24 


1 | 19 


C24 


448 | 465 


SEQ ID No 7 


SEQ ID No 7 




SEQ ID No 7 


12-783 


1 | 449 


B25 


1 | 19 


C25 


429 449 


SEQ ID No 8 


SEQ ID No 8 


SEQ ID No 8 
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Example 3 

Identification Of Biallelic Markers - Sequencing Of Amplified Gen mic DNA And 

Identificati nOfPofvm rnhisms 
The sequencing of the amplified DNA obtained in example 2 was carried out on ABI 377 
5 sequencers. The sequences of the amplification products were determined using automated dideoxy 
terminator sequencing reactions with a dye terminator cycle sequencing protocol. The products of 
the sequencing reactions were run on sequencing gels and the sequences were determined using gel 
image analysis (ABI Prism DNA Sequencing Analysis software (2.12 version)). 

The sequence data were further evaluated to detect the presence of biallelic markers within 
10 the amplified fragments. The polymorphism search was based on the presence of superimposed 
peaks in the electrophoresis pattern resulting from different bases occurring at the same position as 
described previously. 

In the 25 fragments of amplification, 32 biallelic markers were detected. The localization of 
these biallelic markers are as shown in Table 2. 

15 Example 4 

Validation Of The Polymorphisms Through Microsequencing 
The biallelic markers identified in example 3 were further confirmed and their respective 
frequencies were determined through microsequencing. Microsequencing was carried out for each 
individual DNA sample described in Example 1. 

20 Amplification from genomic DNA of individuals was performed by PCR as described above 

for the detection of the biallelic markers with the same set of PCR primers (Table 1 ). 

The preferred primers used in microsequencing were about 19 nucleotides in length and 
hybridized just upstream of the considered polymorphic base. According to the invention, the 
primers used in microsequencing are detailed in Table 4. 

25 The microsequencing reaction was performed as follows : 

After purification of the amplification products, the microsequencing reaction mixture was 
prepared by adding, in a 20pl final volume: 10 pmol microsequencing oligonucleotide, 1 U 
Thermosequenase (Amersham E79000G), 1.25 pi Thermosequenase buffer (260 mM Tris HCI pH 
9.5, 65 mM MgCl 2 )> and the two appropriate fluorescent ddNTPs (Perkin Elmer, Dye Terminator Set 

30 401095) complementary to the nucleotides at the polymorphic site of each biallelic marker tested, 
following the manufacturer's recommendations. After 4 minutes at 94°C, 20 PCR cycles of 1 5 sec 
at 55°C, 5 sec at 72°C, and 10 sec at 94°C were carried out in a Tetrad PTC-225 thermocycler (MJ 
Research). The unincorporated dye terminators were then removed by ethanol precipitation. 
Samples were finally resuspended in formamide-EDTA loading buffer and heated for 2 min at 95°C 
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before being loaded on a polyacrylamide sequencing gel. The data were collected by an ABI 
PRISM 377 DNA sequencer and processed using the GENESCAN software (Perkin Elmer). 

Following gel analysis, data were automatically processed with software that allows the 
determination of the alleles of biallelic markers present in each amplified fragment 

5 The software evaluates such factors as whether the intensities of the signals resulting from 

the above microsequencing procedures are weak, normal, or saturated, or whether the signals are 
ambiguous. In addition, the software identifies significant peaks (according to shape and height 
criteria). Among the significant peaks, peaks corresponding to the targeted site are identified based 
on their position. When two significant peaks are detected for the same position, each sample is 

1 0 categorized classification as homozygous or heterozygous type based on the height ratio. 
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Table 2 



Amplicon 


BM 


Marker 
Name 


Localizati n 
in /HELP gene 


P 


lymor- 
phism 


BM p 

inSE< 


siti n 
3 ID 




all2 


Nol 


No 3 


12-809 


Al 


12-809-119 


Exon 3 


G 


C 


7159 


471 


12-805 


A2 


12-805-115 


Intron 5 


A 


G 


16369 




12-790 


A3 


12-790-396 


Intron 1 1 


A 


G 


26310 




12-791 


A4 


12-791-211 


Intron 14 


A 


G 


31112 




12-803 


AS 


12-803-125 


Intron 14 


A 


T 


33808 




99-33040 


A6 


99-33040-321 


Intron 14 


C 


T 


34255 




12-810 


A7 


12-810-77 


Intron 14 


A 


G 


35293 




12-787 


A8 


12-787-103 


Intron 21 


A 


G 


45892 


12-793 


A9 


12-793-383 


Intron 21 


G 


T 


54497 




12-792 


A10 


12-792-233 


Intron 21 


A 


G 


56582 




99-41009 


All 


99-41009-244 


Intron 21 


A 


G 


60336 




99-41009 


A12 


99-41009-111 


Intron 21 


C 


T 


60469 




12-593 


A13 


12-593-287 


Intron 26 




TAAAT 


79063 


12-593 


A14 


12-593-174 


Intron 26 


C 


T 


79176 


12-589 


A15 


12-589-152 


Intron 26 


G 


T 


80016 


12-785 


A16 


12-785-200 


Intron 30 


C 


T 


87463 


12-785 


A17 


12-785-393 


Intron 30 


A 


G 


87656 


12-588 


A18 


12-588-103 


Intron 36 


A 


G 


101602 


12-603 


A19 


12-603-191 


Intron 37 


C 


T 


104391 


12-586 


A20 


12-586-414 


Intron 43 


A 


G 


117430 


12-586 


A21 


12-586-443 


Intron 43 


- 


C 


117459 


Amplicon 


BM 


Marker 
Name 


Localization 
in HELP gene 


Pc 


ilymor- 
ihism 


BMpos 
in SEC 


iition 
)BD 


alll 


all2 


No 2 




12-602 


A22 


12-602-196 


Intron 46 


C 


T 


2397 




12-602 


A23 


12-602-350 


Exon 47 


A 


C 


2551 


5487 


12-587 


A24 


12-587-379 


Exon 48 


A 


C 


4500 


6265 


12-596 


A25 


12-596-124 


Exon 48 


A 


G 


6119 


7887 


12-808 


A26 


12-808-52 


3' regulatory 


A 


G 


10130 




IZ-oUo 


A tl 

A27 


12-808-75 


3 regulatory 


G 


C 


10153 




Amplicon 


BM 


Marker 
Name 


Localization 


Po 


lymor- 
ihism 


BM position 
In SEQ ID No 


alll 


all2 


10-265 


A28 


10-265-178 


intergenic 


A 


G 


178 


5 


10-266 


A29 


10-266-203 


intergenic 


C 


T 


203 


6 


12-592 


A30 


12-592-118 


intergenic 


A 


T 


118 


7 


12-783 


A31 


12-783-421 


intergenic 


C 


T 


420 


8 


12-783 


A32 


12-783-73 


intergenic 


G 


C 


72 


8 



BM refers to "biallelic marker". Alll and a!12 refer respectively to allele 1 and allele 2 of 
the biallelic marker. 
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Table 3 





l^f flflfPT* TV simp 


f USJUUI 

i pro 


n ronoo 

j raugc 

bes in 

n Ma 1 


Prahf»e 
rruues 


A 1 

Al 


i o OAn 1 in 
1 2-809-1 19 


7147 


7171 


PI 


A2 


i o one 1 ic 
12-805-1 15 


16357 


16381 


P2 


A3 


12-790-396 


26298 


26322 


P3 


A4 


12-791-21 1 


31 100 


31124 


P4 


AS 


1 2-803-1 25 


33796 


33820 


P5 


A6 


99-33040-321 


34243 


34267 


P6 


A7 


12-810-77 


35281 


35305 


P7 


A8 


12-787-103 


45880 


45904 


P8 


A9 


12-793-383 


54485 


54509 


P9 


A10 


12-792-233 


56570 


56594 


P10 


All 


99-41009-244 


60324 


60348 


Pll 


A12 


99-41009-111 


60457 


60481 


P12 


A14 


12-593-174 


79164 


79188 


* P13 






80004 


80028 


PI A 


A 1 fx 


I ^- / Oj-ZUU 


87451 


87475 


PI ^ 


A17 

/A 1 / 




87644 


87668 


PI A 

r id 


A18 

/A 1 O 




101590 


101614 


PI 7 
i i / 


A1Q 


1 i*UUJ "171 


104379 


104403 


P1 ft 


A90 


1 A* -J O V"*T 1 *T 


117418 


117442 


P10 


ISM 


Marker Name 


Position range 
of probes in 
SEQ ID No 2 


r robes 


A22 


12-602-196 


2385 


2409 


P20 


A23 


12-602-350 


2539 


2563 


P21 




uoo /-J /y 


4488 


4512 


DOT 


A25 


12-596-124 


6107 


6131 


P23 


A7£ 
/VZO 


iz-oUo-jZ 


10118 


10142 




A27 


12-808-75 


10141 


10165 


P25 


BM 


Marker Name 


Position range 
of probes 


Probes 


A28 


10-265-178 


166 190 


P26 


In SEQ ID No 5 


A29 


10-266-203 


191 215 


P27 


In SEQ ID No 6 


A30 


12-592-118 


106 130 


P28 


In SEQ ID No 7 


A31 


12-783-421 


408 432 


P29 


In SEQ ID No 8 


A32 


12-783-73 


60 84 


P30 


In SEQ ID No 8 
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Table 4 



Marker Name 


Biallelic 
Marker 


Mis. 1 


Position range f 
microsequencing 
primer in SEQ ID 
Nol 


Mis. 2 


Complementary positi n 

range of 
microsequencing primer 
in SEQ ID No 1 


12-51)9-1 19 


Al 


Dl 


7140 


7158 


El 


7160 


7178 


1 0 one lie 
12-005-1 15 


A2 


D2 


16350 


16368 


E2 


16370 


16388 


12-790-396 


A3 


D3 


26291 


26309 


E3 


26311 


26329 


12-791-21 1 


A A 

A4 


D4 


31093 


31111 


E4 


31113 


31131 


1 I O AO 1 ^ C 

12-803-125 


A5 


D5 


33789 


33807 


E5 


33809 


33827 


99-33040-321 


A6 


D6 


34236 


34254 


E6 


34256 


34274 


1 ^ At A Tt 

12-810-77 


A7 


D7 


35274 


35292 


E7 


35294 


35312 


12-787-103 


A8 


D8 


45873 


45891 


E8 


45893 


45911 


12-793-383 


A9 


D9 


54478 


54496 


E9 


54498 


54516 


12-792-233 


A10 


D10 


56563 


56581 


E10 


56583 


56601 


99-41009-244 


Al 1 


Dll 


60317 


60335 


Ell 


60337 


60355 


99-41009-111 


A12 


D12 


60450 


60468 


E12 


60470 


60488 


12-593-174 


A14 


D13 


79157 


79175 


E13 


79177 


79195 


12-589-152 


A15 


D14 


79997 


80015 


E14 


80017 


80035 


12-785-200 


A16 


D15 


87444 


87462 


E15 


87464 


87482 


12-785-393 


A17 


D16 


87637 


87655 


E16 


87657 


87675 


12-588-103 


A18 


D17 


101583 


101601 


E17 


101603 


101621 


12-603-191 


A19 


D18 


104372 


104390 


E18 


104392 


104410 


12-586-414 


A20 


D19 


117411 


117429 


E19 


117431 


117449 


Marker Name 


Biallelic 
Marker 


Mis. 1 


Position range of 
microsequencing 
primer in SEQ ED 
No 2 


Mis. 2 


Complement 
renge of mici 
primer in SI 


tery position 
^sequencing 
EQ ID No 2 


12-602-196 


A22 


D20 


2378 


2396 


E20 


2398 


2416 


12-602-350 


A23 


D21 


2532 


2550 


E21 


2552 


2570 


1 *> cot inn 

12-587-379 


A24 


D22 


4481 


4499 


E22 


4501 


4519 


12-596-124 


A25 


D23 


6100 


6118 


E23 


6120 


6138 


12-808-52 


A26 


D24 


10111 


10129 


E24 


10131 


10149 


12-808-75 


A27 


D25 


10134 


10152 


E25 


10154 


10172 


Marker Name 


Biallelic 
Marker 


Mis. 1 


Position range of 
microsequencing 
primer 


Mis. 2 


Complement 
renge of micr 
prin 


ery position 
osequencing 
ler 


10-265-178 


A28 




159 | 177 


E26 


179 | 197 


In SEQ ID No 5 




In SEQ ID No 5 


10-266-203 


A29 


D27 


184 | 202 


E27 


204 | 222 


In SEQ ID No 6 




In SEQ ID No 6 


12-592-118 


A30 


D28 


99 117 


E28 


119 137 


In SEQ ID No 7 




In SEQ ID No 7 


12-783-421 


A31 


D29 


401 419 


E29 


421 | 439 




In SEQ ID No 8 




In SEQ ID No 8 


12-783-73 


A32 


D30 


53 71 


E30 


73 91 




In SEQ ID No 8 




In SEQ ID No 8 



Mis 1 and Mis 2 respectively refer to microsequencing primers which hybridized with the 
5 non-coding strand of the HKLP gene or with the coding strand of the HKLP gene. 
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Example 5 

Preparation f Antibody Compositions to the HKLP protein 

Substantially pure protein or polypeptide is isolated from transfected or transformed cells 
containing an expression vector encoding the HKLP protein or a portion thereof. The concentration of 
5 protein in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to 
the level of a few micrograms/ml. Monoclonal or polyclonal antibody to the protein can then be 
prepared as follows: 

A. Monoclonal Antibody Production bv Hvbridoma Fusion 

Monoclonal antibody to epitopes in the HKLP protein or a portion thereof can be prepared from 
1 0 murine hybridomas according to the classical method of Kohler, G. and Milstein, C, (1975) or 
derivative methods thereof. Also see Harlow, E., and D. Lane. 1988. 

Briefly, a mouse is repetitively inoculated with a few micrograms of the HKLP protein or a 
portion thereof over a period of a few weeks. The mouse is then sacrificed, and the antibody producing 
cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse 
1 5 myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media 
comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the 
dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody- 
producing clones are identified by detection of antibody in the supernatant fluid of the wells by 
immunoassay procedures, such as ELISA, as originally described by Engvall, (1980), and derivative 
20 methods thereof. Selected positive clones can be expanded and their monoclonal antibody product 
harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et 
al. Basic Methods in Molecular Biology Elsevier, New York. Section 21-2. 

B. Polyclonal A ntibody Production bv Immunization 

Polyclonal antiserum containing antibodies to heterogeneous epitopes in the HKLP protein or a 
25 portion thereof can be prepared by immunizing suitable non-human animal with the HKLP protein or a 
portion thereof, which can be unmodified or modified to enhance immunogenicity. A suitable non- 
human animal is preferably a non-human mammal is selected, usually a mouse, rat, rabbit, goat, or 
horse. Alternatively, a crude preparation which has been enriched for HKLP concentration can be 
used to generate antibodies. Such proteins, fragments or preparations are introduced into the non- 
30 human mammal in the presence of an appropriate adjuvant (e.g. aluminum hydroxide, RIB1, etc.) 
which is known in the art. In addition the protein, fragment or preparation can be pretreated with an 
agent which will increase antigenicity, such agents are known in the art and include, for example, 
methylated bovine serum albumin (mBSA), bovine serum albumin (BSA), Hepatitis B surface 
antigen, and keyhole limpet hemocyanin (KLH). Serum from the immunized animal is collected, 
35 treated and tested according to known procedures. If the serum contains polyclonal antibodies to 
undesired epitopes, the polyclonal antibodies can be purified by immunoafrmity chromatography. 
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Effective polyclonal antibody production is affected by many factors related both to the 
antigen and the host species. Also, host animals vary in response to site of inoculations and dose, 
with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng 
level) of antigen administered at multiple intradermal sites appears to be most reliable. Techniques 
5 for producing and processing polyclonal antisera are known in the art, see for example, Mayer and 
Walker (1987). An effective immunization protocol for rabbits can be found in Vaitukaitis, J. et al. 
(1971). 

Booster injections can be given at regular intervals, and antiserum harvested when antibody titer 

thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against 
1 0 known concentrations of the antigen, begins to fell. See, for example, Ouchterlony, O. et al., (1973). 

Plateau concentration of antibody is usually in the range of 0. 1 to 0.2 mg/ml of serum (about 12 pM). 

Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as 

described, for example, by Fisher, D., (1 980). 

Antibody preparations prepared according to either the monoclonal or the polyclonal protocol 
5 are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances 

in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of 

antigen in a biological sample. The antibodies may also be used in therapeutic compositions for killing 

cells expressing the protein or reducing the levels of the protein in the body. 

While the preferred embodiment of the invention has been illustrated and described, it will 
be appreciated that various changes can be made therein by the one skilled in the art without 
departing from the spirit and scope of the invention. 

FREE TEXT OF THE SEQUENCE LISTING 
The following free text appears in the accompanying Sequence Listing : 
3'regulatory region 
polymorphic base 
or 

complement 
probe 
deletion of 
insertion of 

sequencing oligonucleotide Primer 
Artificial Sequence 
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CLAIMS 

1. An isolated, purified or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of SEQ ID No 1 or the complementary sequence thereof, wherein said 
contiguous span comprises either : 

5 - at least 1 of the following nucleotide positions of SEQ ID No 1 : 1-39624, 39705-40589, 

40666^3629, 43710-44203, 4431 1-45125, 45210-45440, 45622-45717, 45791-68580, 68675- 
70246, 70396-72421, 72601-73295, 73434-74648, 74898-83055, 83175-85192, 85279-85609, 
85740-85906, 86070-88304, 88396-90585, 90705-91767, 91824-94380, 94490-96296, 96364- 
97184, 97270-101167, 101274-109465, 109581-110228, 110363-111819, 111882-113636, 113783- 
10 113945, 114186-117002, 117075-119676, and 119677-121 162; or, 

- a G at position 7159 of SEQ ID No 1 . 

2. An isolated, purified or recombinant polynucleotide comprising a contiguous span of at 
least 12 nucleotides of SEQ ID No 2 or the complementary sequence thereof, wherein said 

1 5 contiguous span comprises: 

-at least 1 of the following nucleotide posit ions of SEQ ID No 2: 1-1600, 1751-2138,2332- 
2539, 2659-3829 and 8885-10884; or, 

- a C either at position 255 1 or 4500 of SEQ ID No 2. 

20 3. An isolated, purified or recombinant polynucleotide comprising a contiguous span of at 

least 12 nucleotides of SEQ ID No 3 or the complementary sequence thereof, wherein said 
contiguous span comprises either : 

- at least 1 of the following nucleotide positions of SEQ ID No 3: 391-1619 and 6988- 
10682; or, 

25 - a C at position 5487, or a C at position 6265 of SEQ ID No 3. 

4. An isolated, purified, or recombinant polynucleotide consisting essentially of a 
contiguous span of 8 to 50 nucleotides of SEQ ID Nos 1-3 and 5-8 or the complement thereof, 
wherein said span includes a tfALP-related biallelic marker in said sequence. 

30 

5. A polynucleotide according to claim 4, wherein said flKLP-related biallelic marker is 
selected from the group consisting of A 1 to A32, and the complements thereof. 

6. A polynucleotide according to claim 4, wherein said /ZK£P-related biallelic marker is 
55 selected from the group consisting of Al to A22 and A25 to A32, and the complements thereof. 
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7. A polynucleotide according to claim 4, wherein said /JKiP-related biallelic marker is 
selected from the group consisting of A23 and A24, and the complements thereof. 

8. A polynucleotide according to any one of claims 4 to 7, wherein said contiguous span is 
5 1 8 to 35 nucleotides in length and said biallelic marker is within 4 nucleotides of the center of said 

polynucleotide. 

9: A polynucleotide according to claim 8, wherein said polynucleotide consists of said 
contiguous span and said contiguous span is 25 nucleotides in length and said biallelic marker is at 
1 0 the center of said polynucleotide. 



10. A polynucleotide according to claim 9, wherein said polynucleotide consists essentially 
of a sequence selected from the following sequences: PI to P30, and the complementary sequences 
thereto. 

15 

1 1 . A polynucleotide according to any one of claims 4 to 7, wherein the 3' end of said 
contiguous span is located at the 3" end of said polynucleotide and said biallelic marker is present at 
the 3* end of said polynucleotide. 

20 12. A polynucleotide according to any one of claims 1 to 3, wherein the 3' end of said 

contiguous span is present at the 3* end of said polynucleotide. 

13. A polynucleotide according to claim 12, wherein the 3* end of said polynucleotide is 
located within 20 nucleotides upstream of a /KIP-related biallelic marker in said sequence. 

25 

14. A polynucleotide according to claim 13, wherein the 3' end of said polynucleotide is 
located 1 nucleotide upstream of said //KZP-related biallelic marker in said sequence. 

15. A polynucleotide according to claim 14, wherein said polynucleotide consists essentially 
30 of a sequence selected from the following sequences: Dl to D30, and El to E30. 

16. An isolated, purified, or recombinant polynucleotide consisting essentially of a sequence 
selected from the following sequences: Bl to B19 and CI to C25. 

35 17. An isolated, purified, or recombinant polynucleotide which encodes a polypeptide 

comprising a contiguous span of at least 6 amino acids of SEQ ID No 4, wherein said contiguous 
span includes at least 1 of the amino acid positions 1-478 of the SEQ ID No 4. 
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18. A polynucleotide according to any one of claims 1 to 17 attached to a solid support 

19. An array of polynucleotides comprising at least one polynucleotide according to claim 

5 18. 

20. An array according to claim 19, wherein said array is addressable. 

2 1 . A polynucleotide according to any one of claims 1 to 1 7 further comprising a label. 

10 

22. A recombinant vector comprising a polynucleotide according to any one of claims 1 to 

17. 

23. A host cell comprising a recombinant vector according to claim 22. 

15 

24. A non-human host animal or mammal comprising a recombinant vector according to 
claim 22. 

25. A mammalian host cell comprising an HKLP gene disrupted by homologous 

20 recombination with a knock out vector, comprising a polynucleotide according to any one of claims 
1 to 17. 

26. A non-human host mammal comprising a HKLP gene disrupted by homologous 
recombination with a knock out vector, comprising a polynucleotide according to any one of claims 

25 1 to 17. 

27. A method of genotyping comprising determining the identity of a nucleotide at a HKLP- 
related biallelic marker or the complement thereof in a biological sample. 

30 28. A method according to claim 27, wherein said biological sample is derived from a single 

subject. 

29. A method according to claim 28, wherein the identity of the nucleotides at said biallelic 
marker is determined for both copies of said biallelic marker present in said individual's genome. 

35 

30. A method according to claim 27, wherein said biological sample is derived from 
multiple subjects. 
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3 1 . A method according to claim 27, further comprising amplifying a portion of said 
sequence comprising the biallelic marker prior to said determining step. 

5 32. A method according to claim 3 1 , wherein said amplifying is performed by PCR. 

33. A method according to claim 27, wherein said determining is performed by a 
hybridization assay. 

10 34. A method according to claim 27, wherein said determining is performed by a sequencing 

assay. 

35. A method according to claim 27, wherein said determining is performed by a 
microsequencing assay. 

15 

36. A method according to claim 27, wherein said determining is performed by an enzyme- 
based mismatch detection assay. 

37. A method of estimating the frequency of an allele of a //AZP-related biallelic marker in 
20 a population comprising: 

a) genotyping individuals from said population for said biallelic marker according to the 
method of claim 27; and 

b) determining the proportional representation of said biallelic marker in said population. 

25 38. A method of detecting an association between a genotype and a trait, comprising the 

steps of: 

a) determining the frequency of at least one HXZP-related biallelic marker in trait positive 
population according to the method of claim 37; 

b) determining the frequency of said tfAZP-related biallelic marker in a control population 
30 according to the method of claim 37; and 

c) determining whether a statistically significant association exists between said genotype 
and said trait. 

39. A method of estimating the frequency of a haplotype for a set of biallelic markers in a 
35 population, comprising: 

a) genotyping at least one T/AZ/'-related biallelic marker according to claim 29 for each 
individual in said population; 



WO 00/63375 PCI7IB00/00562 

119 

b) genotyping a second biallelic marker by determining the identity of the nucleotides at said 
second biallelic marker for both copies of said second biallelic marker present in the genome of each 
individual in said population; and 

c) applying a haplotype determination method to the identities of the nucleotides determined 
5 in steps a) and b) to obtain an estimate of said frequency. 

40. A method according to claim 39, wherein said haplotype determination method is 
selected from the group consisting of asymmetric PCR amplifcation, double PCR amplification of 
specific alleles, the Clark method, or an expectation maximization algorithm. 

10 

41 . A method of detecting an association between a haplotype and a trait, comprising the 
steps of: 

a) estimating the frequency of at least one haplotype in a trait positive population according 
to the method of claim 39; 
1 5 b) estimating the frequency of said haplotype in a control population according to the 

method of claim 39; and 

c) determining whether a statistically significant association exists between said haplotype 
and said trait. 

20 42. A method according to claim 38, wherein said genotyping steps a) and b) are performed 

on a single pooled biological sample derived from each of said populations. 

43. A method according to claim 38, wherein said genotyping steps a) and b) performed 
separately on biological samples derived from each individual in said populations. 

25 

44. A method according to either claim 38 or 41, wherein said control population is a trait 
negative population. 

45. A method according to either claim 38 or 41, wherein said case control population is a 
30 random population. 

46. An isolated, purified, or recombinant polypeptide comprising a contiguous span of at 
least 6 amino acids of SEQ ID No 4, wherein said contiguous span includes at least 1 of the amino 
acid positions 1-478 of the SEQ ID No 4. 

35 
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47. An isolated or purified antibody composition are capable of selectively binding to an 
epitope-containing fragment of a polypeptide according to claim 46, wherein said epitope comprises 
at least 1 of the amino acid positions 1-478 of the SEQ ID No 4. 

5 48. A method according to any one of claims 27 to 45 wherein said /OCLP-related biallelic 

marker is selected from the group consisting of Al to A32, and the complements thereof. 

49. A method according to any one of claims 27 to 45 wherein said i/AZ/>-related biallelic 
marker is selected from the group consisting of Al to A22 and A25 to A32, and the complements 
10 thereof. 



50. A method according to any one of claims 27 to 45 wherein said //KLP-related biallelic 
marker is selected from the group consisting of A23 and A24, and the complements thereof. 

15 5 1 . A diagnostic kit comprising a polynucleotide according to any one of claims 1 to 2 1 . 

52. Use of a polynucleotide comprising a contiguous span of at least 12 nucleotides of a 
sequence selected from the group consisting of the SEQ ID Nos 1-3 and 5-8 or the complementary 
sequence thereto for determining the identity of the nucleotide at a HKLP-relsted biallelic marker 



20 



25 



53. Use according to claim 52 in a microsequencing assay, wherein the 3' end of said 
contiguous span is located at the 3' end of said polynucleotide and wherein the 3' end of said 
polynucleotide is located 1 nucleotide upstream of said HKLP related biallelic marker in said 
sequence. 



54. Use according to claim 52 in a hybridization assay, wherein said span includes said 
HKLP -related biallelic marker. 



55. Use according to claim 52 in a specific amplification assay, wherein the 3' end of said 
30 contiguous span is located at the 3' end of said polynucleotide and said biallelic marker is present at 

the 3* end of said polynucleotide. 

56. Use according to claim 52 in a sequencing assay, wherein the 3' end of said contiguous 
span is located at the 3' end of said polynucleotide. 

35 

57. Use according to any one of claims 52-56, wherein said HKLP -related biallelic is a 
biallelic marker selected in the group consisting of the biallelic markers Al to A32. 
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58. A computer readable medium having stored thereon a sequence selected from the group 
consisting of a nucleic acid code comprising one of the following: 

a) a contiguous span of at least 12 nucleotides of SEQ ID No 1, wherein said contiguous 
5 span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of SEQ ID No 1 : 1- 

39624, 39705-40589, 40666-43629, 43710-44203, 4431 M5125, 45210-45440, 45622-45717, 
45791-68580, 68675-70246, 70396-72421, 72601-73295, 73434-74648, 74898-83055, 83175- 
85192, 85279-85609, 85740-85906, 86070-88304, 88396-90585, 90705-91767, 91824-94380, 
94490-96296, 96364-97184, 97270-101167, 101274-109465, 109581-110228, 110363-111819, 
10 111882-113636, 113783-113945, 114186-117002, 11 7075-1 19676, and 119677-121162; 

b) a contiguous span of at least 12 nucleotides of SEQ ID No 2 or the complements thereof, 
wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 
positions of SEQ ID No 2: 1-1600, 1751-2138, 2332-2539, 2659-3829 and 8885-10884; 

c) a contiguous span of at least 12 nucleotides of SEQ ID No 1 or the complements thereof, 
1 5 wherein said contiguous span comprises a G at position 7 1 59 of SEQ ID No 1 ; 

d) a contiguous span of at least 12 nucleotides of SEQ ID No 4 or the complements thereof, 
wherein said contiguous span comprises a C either at position 255 1 or 4500 of SEQ ID No 4; 

e) a contiguous span of at least 12 nucleotides of SEQ ID No 3 or the complements thereof, 
wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide 

20 positions of SEQ ID No 3 : 39 1 - 1 6 1 9 and 6988- 1 0682; 

f) a contiguous span of at least 12 nucleotides of SEQ ID No 3 or the complements thereof, 
wherein said contiguous span comprises a nucleotide selected in the group consisting of a C at 
position 5487, and a C at position 6265 of SEQ ID No 3; and 

g) a nucleotide sequence complementary to any one of the contiguous spans of a), b), c), d), e), 

25 orf). 

59. A computer readable medium having stored thereon a sequence consisting of a 
polypeptide code comprising a contiguous span of at least 6 amino acids of SEQ ID No 4, wherein 
said contiguous span includes at least 1 of the amino acid positions 1-478 of the SEQ ID No 4. 

30 

60. A computer system comprising a processor and a data storage device wherein said data 
storage device a computer readable medium according to with claim 58 or 59. 

61 . A computer system according to claim 60, further comprising a sequence comparer and 
35 a data storage device having reference sequences stored thereon. 
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62. A computer system of Claim 61 wherein said sequence comparer comprises a computer 
program which indicates polymorphisms. 

63. A computer system of Claim 60 further comprising an identifier which identifies 
5 features in said sequence. 

64. A method for comparing a first sequence to a reference sequence, comprising the steps 

of: 

reading said first sequence and said reference sequence through use of a computer program 
1 0 which compares sequences; and 

determining differences between said first sequence and said reference sequence with said 
computer program, 

wherein said first sequence is selected from the group consisting of a nucleic acid code 
comprising one of the following: 

15 a) a contiguous span of at least 1 2 nucleotides of SEQ ID No 1 , wherein said 

contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of 
SEQ ID No 1: 1-39624, 39705-40589, 40666-43629, 43710-44203, 4431 1-45125, 45210- 
45440, 45622-45717, 45791-68580, 68675-70246, 70396-72421, 72601-73295, 73434- 
74648, 74898-83055, 83175-85192, 85279-85609, 85740-85906, 86070-88304, 88396- 

20 90585, 90705-91767, 91824-94380, 94490-96296, 96364-97184, 97270-101 167, 101274- 

109465, 109581-110228, 110363-111819, 111882-113636, 113783-113945, 114186- 
1 17002, 1 17075-1 19676, and 1 19677-121162; 

b) a contiguous span of at least 12 nucleotides of SEQ ID No 2 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following 

25 nucleotide positions of SEQ ID No 2: 1-1600, 1751-2138, 2332-2539, 2659-3829 and 8885- 

10884; 

c) a contiguous span of at least 12 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises a G at position 7 1 59 of SEQ ID No 1 ; 

d) a contiguous span of at least 12 nucleotides of SEQ ID No 4 or the complements 
30 thereof, wherein said contiguous span comprises a C either at position 255 1 or 4500 of SEQ 

ID No 4; 

e) a contiguous span of at least 12 nucleotides of SEQ ID No 3 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following 
nucleotide positions of SEQ ID No 3: 391-1619 and 6988-10682; 

35 f) a contiguous span of at least 12 nucleotides of SEQ ID No 3 or the complements 

thereof, wherein said contiguous span comprises a nucleotide selected in the group 
consisting of a C at position 5487, and a C at position 6265 of SEQ ID No 3; and 
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g) a nucleotide sequence complementary to any one of the contiguous spans of a), b), 
c),d),e),orf); and 

a polypeptide code comprising a contiguous span of at least 6 amino acids of SEQ 
ID No 4, wherein said contiguous span includes at least 1 of the amino acid positions 1-478 
5 oftheSEQIDNo4. 

65. A method according to Claim 64, wherein said step of determining differences between 
the first sequence and the reference sequence comprises identifying at least one polymorphism. 

10 66. A method for identifying a feature in a sequence, comprising the steps of: 

reading said sequence through the use of a computer program which identifies features in 
sequences; and 

identifying features in said sequence with said computer program; 
wherein said sequence is selected from the group consisting of a nucleic acid code 
1 5 comprising one of the following: 

a) a contiguous span of at least 12 nucleotides of SEQ ID No 1, wherein said 
contiguous span comprises at least 1, 2, 3, 5, or 10 of the following nucleotide positions of 
SEQ ID No 1: 1-39624, 39705-40589, 40666-43629, 43710-44203, 4431 1-45125, 45210- 
45440, 45622-45717, 45791-68580, 68675-70246, 70396-72421, 72601-73295, 73434- 

20 74648, 74898-83055, 83175-85192, 85279-85609, 85740-85906, 86070-88304, 88396- 

90585, 90705-91767, 91824-94380, 94490-96296, 96364-97184, 97270-101 167, 101274- 
109465, 109581-110228, 110363-111819, 1 1 1882-1 13636, 1 13783-1 13945, 114186- 
117002, 117075-119676, and 1 19677-121 162; 

b) a contiguous span of at least 12 nucleotides of SEQ ID No 2 or the complements 
25 thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following 

nucleotide positions of SEQ ID No 2: 1-1600, 1751-2138, 2332-2539, 2659-3829 and 8885- 
10884; 

c) a contiguous span of at least 12 nucleotides of SEQ ID No 1 or the complements 
thereof, wherein said contiguous span comprises a G at position 71 59 of SEQ ID No 1 ; 

30 d) a contiguous span of at least 1 2 nucleotides of SEQ ID No 4 or the complements 

thereof, wherein said contiguous span comprises a C either at position 255 1 or 4500 of SEQ 
ID No 4; 

e) a contiguous span of at least 12 nucleotides of SEQ ID No 3 or the complements 
thereof, wherein said contiguous span comprises at least 1, 2, 3, 5, or 10 of the following 
35 nucleotide positions of SEQ ID No 3: 391-1619 and 6988-10682; 
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f) a contiguous span of at least 12 nucleotides of SEQ ID No 3 or the complements 
thereof, wherein said contiguous span comprises a nucleotide selected in the group 
consisting of a C at position 5487, and a C at position 6265 of SEQ ID No 3; and 

g) a nucleotide sequence complementary to any one of the contiguous spans of a), b), 
5 c),dXe),orO; and 

a polypeptide code comprising a contiguous span of at least 6 amino acids of SEQ 
ID No 4, wherein said contiguous span includes at least 1 of the amino acid positions 1-478 
of the SEQ ID No 4. 

10 67. A method for the screening of a candidate substance interacting with a HKLP 

polyeptide comprising the following steps : 

a) providing a polypeptide consisting of a HKLP protein or a fragment comprising a 
contiguous span of at least 6 amino acids amino acids of SEQ ID No 4, wherein said contiguous 
span includes at least 1 of the amino acid positions 1-478 of the SEQ ED No 4 or a variant thereof; 

1 5 b) obtaining a candidate substance; 

c) bringing into contact said polypeptide with said candidate substance; 

d) detecting the complexes formed between said polypeptide and said candidate substance. 
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<213> Homo sapiens 
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<222> 70247. .70395 
<223> exon 
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<221> exon 

<222> 72422.. 72600 

<223> exon 
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<221> exon 

<222> 73296.. 73433 
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<220> 

<221> exon 

<222> 97185.. 97269 

<223> exon 

<220> 

<221> exon 

<222> 101168. .101273 
<223> exon 

<220> 

<221> exon 

<222> 109466. .109580 
<223> exon 

<220> 

<221> exon 

<222> 110229. .110362 
<223> exon 

<220> 

<221> exon 
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<223> exon 
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<221> exon 
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<223> exon 

<220> 

<221> exon 

<222> 113946. .114185 

<223> exon 

<220> 

<221> exon 

<222> 117003. .117074 

<223> exon 

<220> 

<221> exon 

<222> 119677. .119798 

<223> exon 

<220> 

<221> allele 

<222> 7159 

<223> 12-809-119 : polymorphic base G or C 
<220> 

<221> allele 

<222> 16369 

<223> 12-805-115 : polymorphic base A or G 
<220> 

<221> allele 

<222> 26310 

<223> 12-790-396 : polymorphic base A or G 
<220> 

<221> allele 

<222> 31112 
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<223> 12-791-211 : polymorphic bas A or G 
<220> 

<221> allele 
<222> 33808 

<223> 12-803-125 : polymorphic base A or T 
<220> 

<221> allele 
<222> 34255 

<223> 99-33040-321 : polymorphic base C or T 
<220> 

<221> allele 
<222> 35293 

<223> 12-810-77 : polymorphic base A or G 
<220> 

<221> allele 
<222> 45892 

<223> 12-787-103 : polymorphic base A or G 
<220> 

<221> allele 
<222> 54497 

<223> 12-793-383 : polymorphic base G or T 
<220> 

<221> allele 
<222> 56582 

<223> 12-792-233 : polymorphic base A or G 
<220> 

<221> allele 
<222> 60336 

<223> 99-41009-244 : polymorphic base A or G 
<220> 

<221> allele 
<222> 60469 

<223> 99-41009-111 : polymorphic base C or T 
<220> 

<221> allele 
<222> 79063 

<223> 12-593-287 : insertion of TAAAT 
<220> 

<221> allele 
<222> 79176 

<223> 12-593-174 : polymorphic base C or T 
<220> 

<221> allele 
<222> 80016 

<223> 12-589-152 : polymorphic base G or T 
<220> 

<221> allele 
<222> 87463 

<223> 12-785-200 : polymorphic base C or T 
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<221> allele 
<222> 87656 

<223> 12-785-393 : polymorphic base A or G 
<220> 

<221> allele 
<222> 101602 

<223> 12-588-103 : polymorphic base A or G 
<220> 

<221> allele 
<222> 104391 

<223> 12-603-191 : polymorphic base C or T 
<220> 

<221> allele 
<222> 117430 

<223> 12-586-414 : polymorphic base A or G 
<220> 

<221> allele 
<222> 117459 

<223> 12-586-443 : deletion of C 
<220> 

<221> primer_bind 
<222> 7041. .7060 
<223> 12-809. pu 

<220> 

<221> primer Jbind 

<222> 7545.. 7565 

<223> 12-809. rp complement 

<220> 

<221> primer_bind 
<222> 16004. .16024 
<223> 12-805. rp 

<220> 

<221> primer_bind 

<222> 16464. .16483 

<223> 12-805. pu complement 

<220> 

<221> primer__bind 
<222> 26232. .26252 
<223> 12-790. rp 

<220> 

<221> primer_bind 

<222> 26685. .26705 

<223> 12-790. pu complement 

<220> 

<221> primer_bind 
<222> 30902. .30922 
<223> 12-791. pu 

<220> 

<221> primer Jbind 

<222> 31282. .31301 

<223> 12-7 91. rp complement 
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<221> priraer_bind 

<222> 33476. .33496 

<223> 12-803. rp 

<220> 

<221> primer_bind 

<222> 33912. .33932 

<223> 12-803. pu complement 

<220> 

<221> prime r_bind 
<222> 33934. .33953 
<223> 99-33040. pu 

<220> 

<221> primer_bind 

<222> 34364. .34383 

<223> 99-33040. rp complement 

<220> 

<221> primer_bind 
<222> 34918. .34938 
<223> 12-810. rp 

<220> 

<221> primer__bind 

<222> 35350. .35369 

<223> 12-8 10. pu complement 

<220> 

<221> primer_bind 
<222> 45465. .45485 
<223> 12-787. rp 

<220> 

<221> primer_bind 
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<220> 
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<223> 12-793. rp 

<220> 

<221> primer_bind 
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<223> 12-793. pu complement 

<220> 

<221> primer_bind 
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<223> 12-792. rp 

<220> 

<221> primerjbind 

<222> 56793.. 56813 

<223> 12-792. pu complement 

<220> 

<221> primer bind 
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<222> 60111. .60130 
<223> 99-41009. rp 

<220> 

<221> primerjbind 

<222> 60562. .60580 

<223> 99-41009. pu complement 

<220> 

<221> primer_bind 
<222> 78815.. 78835 
<223> 12-593. rp 

<220> 

<221> primer_bind 

<222> 79332.. 79349 

<223> 12-593. pu complement 

<220> 

<221> primerjbind 
<222> 79706.. 79726 
<223> 12-589. rp 

<220> 

<221> primerjbind 

<222> 80149. .80167 

<223> 12-589. pu complement 

<220> 

<221> primerjbind 

<222> 87264. .87284 

<223> 12-785. pu 

<220> 

<221> prime r_bind 

<222> 87753.. 87773 

<223> 12-785. rp complement 

<220> 

<221> primerjbind 
<222> 101161. .101178 
<223> 12-588. rp 

<220> 

<221> primerjbind 
<222> 101686. .101704 
<223> 12-588. pu complement 

<220> 

<221> primerjbind 
<222> 104131. .104151 
<223> 12-603. rp 

<220> 

<221> primer bind 
<222> 1045587.104578 
<223> 12-603. pu complement 

<220> 

<221> prime r_bind 
<222> 117017. .117036 
<223> 12-586. pu 
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<220> 

<221> primer_bind 

<222> 117481. .117501 

<223> 12-586. rp complement 

<220> 

<221> primer Jbind 

<222> 7140. .7158 

<223> 12-809-119. mis 

<220> 

<221> primerjbind 

<222> 7160.. 7178 

<223> 12-809-119. mis complement 

<220> 

<221> primer_bind 
<222> 16350. .16368 
<223> 12-805-115. mis 

<220> 

<221> primer_bind 

<222> 16370. .16388 

<223> 12-805-115. mis complement 

<220> 

<221> primerjbind 
<222> 26291. .26309 
<223> 12-790-396. mis 

<220> 

<221> primer_bind 

<222> 26311. .26329 

<223> 12-790-396. mis complement 

<220> 

<221> primer_bind 
<222> 31093. .31111 
<223> 12-791-211. mis 

<220> 

<221> primerjbind 

<222> 31113. .31131 

<223> 12-791-211. mis complement 

<220> 

<221> primer_bind 
<222> 33789. .33807 
<223> 12-803-125. mis 

<220> 

<221> primer_bind 

<222> 33809. .33827 

<223> 12-803-125. mis complement 

<220> 

<221> primerjbind 
<222> 34236. .34254 
<223> 99-33040-321. mis 

<220> 

<221> primer Joind 
<222> 34256. .34274 
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<223> 99-33040-321. mis complement 
<220> 

<221> primer Jbind 
<222> 35274. .35292 
<223> 12-810-77. mis 

<220> 

<221> primerjbind 

<222> 35294. .35312 

<223> 12-810-77. mis complement 

<220> 

<221> primerjbind 
<222> 45873. .45891 
<223> 12-787-103. mis 

<220> 

<221> primer_bind 

<222> 45893. .45911 

<223> 12-787-103. mis complement 

<220> 

<221> primer_bind 
<222> 54478. .54496 
<223> 12-793-383. mis 

<220> 

<221> primer_bind 

<222> 54498. .54516 

<223> 12-793-383. mis complement 

<220> 

<221> primerjbind 
<222> 56563. .56581 
<223> 12-792-233. mis 

<220> 

<221> primerjbind 

<222> 56583. .56601 

<223> 12-792-233. mis complement 

<220> 

<221> primer Jaind 
<222> 60317. .60335 
<223> 99-41009-244. mis 

<220> 

<221> primerjoind 

<222> 60337. .60355 

<223> 99-41009-244. mis complement 

<220> 

<221> primer Jbind 
<222> 60450. .60468 
<223> 99-41009-111. mis 

<220> 

<221> primer_bind 

<222> 60470. . 60488 

<223> 99-41009-111. mis complement 

<220> 
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<221> primerjbind 
<222> 79157.. 79175 
<223> 12-593-174. mis 

<220> 

<221> primerjoind 

<222> 79177. .79195 

<223> 12-593-174. mis complement 

<220> 

<221> primerjoind 

<222> 79997. .80015 

<223> 12-589-152. mis 

<220> 

<221> primer_bind 

<222> 80017. .80035 

<223> 12-589-152. mis complement 

<220> 

<221> primer_bind 
<222> 87444.. 87462 
<223> 12-785-200. mis 

<220> 

<221> prime rjoind 

<222> 87464. .87482 

<223> 12-785-200. mis complement 

<220> 

<221> primer_bind 
<222> 87637. .87655 
<223> 12-785-393. mis 

<220> 

<221> primer_bind 

<222> 87657. .87675 

<223> 12-785-393. mis complement 

<220> 

<221> primer_bind 
<222> 101583. .101601 
<223> 12-588-103. mis 

<220> 

<221> primer_bind 

<222> 101603. .101621 

<223> 12-588-103. mis complement 

<220> 

<221> prime r_bind 
<222> 104372. .104390 
<223> 12-603-191. mis 

<220> 

<221> primer_bind 

<222> 104392. .104410 

<223> 12-603-191. mis complement 

<220> 

<221> primer_bind 
<222> 117411. .117429 
<223> 12-586-414. mis 
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<220> 

<221> prime r_bind 

<222> 117431. .117449 

<223> 12-586-414 .mis complement 

<220> 

<221> miscjbinding 
<222> 7147.. 7171 
<223> 12-809-119. probe 

<220> 

<221> miscjbinding 
<222> 16357.. 16381 
<223> 12-805-115. probe 

<220> 

<221> misc_binding 
<222> 26298. .26322 
<223> 12-790-396. probe 

<220> 

<221> miscjbinding 
<222> 31100.. 31124 
<223> 12-791-211. probe 

<220> 

<221> miscjbinding 

<222> 33796.. 33820 

<223> 12-803-125. probe 

<220> 

<221> miscjbinding 
<222> 34243. .34267 
<223> 99-33040-321. probe 

<220> 

<221> miscjbinding 
<222> 35281. .35305 
<223> 12-810-77. probe 

<220> 

<221> miscjbinding 
<222> 45880.. 45904 
<223> 12-787-103. probe 

<220> 

<221> misc__binding 
<222> 54485. .54509 
<223> 12-793-383. probe 

<220> 

<221> miscjbinding 
<222> 56570. .56594 
<223> 12-792-233. probe 

<220> 

<221> misc_binding 
<222> 60324. .60348 
<223> 99-41009-244. probe 

<220> 

<221> misc_binding 
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<222> 60457. .60481 
<223> 99-41009-111. probe 

<220> 

<221> misc — binding 
<222> 79164. .79188 
<223> 12-593-174. probe 

<220> 

<221> misc_binding 
<222> 80004 . .80028 
<223> 12-589-152. probe 

<220> 

<221> misc_binding 
<222> 87451. .87475 
<223> 12-785-200. probe 

<220> 

<221> misc_binding 
<222> 87644. .87668 
<223> 12-785-393. probe 

<220> 

<221> misc_binding 
<222> 101590. .101614 
<223> 12-588-103. probe 

<220> 

<221> misc_binding 
<222> 104379. .104403 
<223> 12-603-191. probe 

<220> 

<221> raisc_binding 
<222> 117418. .117442 
<223> 12-586-414. probe 

<220> 

<221> misc_feature 
<222> 20352,102280 
<223> n=a, g, c or t 

<400> 1 

ttggtagaat tggtagctaa aaggctgagt gaaatatggc ttaaaagctt tattggctgg 60 

gcgcggtggc tcacccctgt aatcccagca ctttgggtgg cttaggtggg tggataacct 120 

gaggtcagga gtttgagaac agctgaccaa catggtgaaa ccctgtctct actagacata 180 

caaaattagc caggtgtggt gatgcatgcc tgtaatccta gatacgtggc aggctgaggc 24 0 

aggagaatta cttgaaccca ggaggcggag gttgcagtga gctgagatcc taccattgca 300 

ctccagcctg ggcaacaaga gcaaaactcc gtctcaaaaa aaaaaaagct ttatcattta 360 

ttttttggcc ctgtcttatg gtgcagaggc ttaaaagttt tttgacagca aattttctag 420 

aggctaggag tgtttattat aaccatgttt ttgagcggtg aggactacct cagagggcat 480 

gccttgtgtc atcattgttc ttattgctga gctaccgaaa cctagaatct gactcacaca 540 

atatgacact tatttccgtt ttcttggtag agtttgtgtg gtcattcatc ttttagatct 600 

tttagaaact acagtcctcc tagttcccac ttttatattt atttatttat ttatttttga 660 

gatggagtct tgctctgttg cccaggctgg agtgcagtgg tgcaatctcg gctcactgca 720 

acctctgcct cccaggttca agcaattctc ccgcctcagc ctcccaagta gctgggatta 780 

caggtgcgtg ccaccatgcc tcgctaactt ttgtattttt tgtagagacg gggtttcacc 840 

acgttggcca ggctggtcgc aaactcctga cctcaagtga tcaacccccc tcagcctccc 900 

aaagtgctgg gattacaggc gtgagtgacc gtgcccagcc atcagttccc acttttaaat 960 

gagaggttct ttgtttttgt ggggtttttg ttgttgttgt tgttgttgtt gttgttgctg 1020 

ttgttacttg agacagggtc ttagtctgtc atccaggatg gagtgcagtg gcacaatctt 1080 

ggctaactgc agcctggacc tccctgggct caggtgatct tcctacctaa gccccctgag 1140 
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tagctgggag tgcaggcaca caccaccatg 
tgagatttca ccatgttgcc caggctggtc 
cctgagcttc ccaaagtgct gggattataa 
agtttttttt tgttttgaga tggagttttg 
tgtgatcttg gctcactgca acctccacct 
tagcctcccg agtagctggg attacaggcg 
tttttaacag agatggggtt tcaccatgtt 
ggtgatctgc ccacctcagc ctcccaaagt 
tggccaataa agagaatttt aaacaatgcc 
acatgtaggt gtgcacatgt atttccatga 
gctgcatata gggaataagc atttaaatga 
agccaaaccc tccaaaattg catcctgaat 
aacttattta acctctctgt gatggtttat 
tggctcattg agtgcttggg agcatcaaat 
cctgtaatcc cagcactttg ggaagctgag 
agaccagcct gggcaacata gtgagactcc 
gtgtagtggc atgaacctgt ggtcccatct 
gcccaggagg ttgagtctgt agtgaattgt 
cagagttaga tcctgtctca ggaaaaaaaa 
agagtaagtg ttctgtgaat gttggctgtt 
ttagaagtat acttctttgc ttggattacc 
accatgtctc cttgtcacat gattatgtac 
tttgaggttg gaacataaat gcgtggctgt 
aagcaccgta gcaactgttg ctatttgtgt 
ttttttaatc ctgatctcac tgatagcctg 
agattggttg agatatccag tctttgagga 
gtttcatttc aaatatgagg agctaaagct 
gctttaaagc attcatagtc agtccctgaa 
aatcactttt aatgtgggtg gtgagacttc 
gtaatgtggt tcatttccaa attttggctg 
acagagtatt tactgagcat ctactttata 
cagttaatta cactccaaaa tgacttagtc 
atacacacac actaatcatt tacatacttg 
tttgtaatga atgcttcatt accaaatgct 
agagctttca tggccaggcg cggtggctca 
aggcgggcgg atcacaaggt caggagatca 
gtctctacta aaaatacaaa aaaattagct 
ctactcagga ggctgaggca ggagaatggc 
ccgagatcgt gccactcccc ttcagctagg 
aaaaaaaaaa aaaaaaggct gctttcaaat 
ttttcatttc tctttttttt cttgagatga 
agtgcattaa tatcagctca gtgcaacctc 
tcaacctccc cagtagctgg gattataggc 
tttttggtag agatggggtt tcaccatgtt 
agtgatacac ctgccttggc ctcccaaagt 
cagctccatt tctgatgtgt tccaatattt 
tgtgggccag gtgcagtggc tcacgcctgt 
cagatcacct gaggtcagga gttcgagacc 
tactaaaaat aaaaaaatta gccagttgta 
ggaggctgag gcgggagaat ttcttgaacc 
cacaccactg cactccagct tggggaatag 
atagaaatgt cgttcatttc caaattttgg 
gtcacagaat atttactgag catctacttt 
acgcagttaa tgctgttggt ctggtttttc 
tatgattgtt ttcattcatt cattcatttt 
cagtggctcg ataacagcta actacatcct 
ctcagcttcc taagtagctg ggactgcagg 
tgtttttttt ttttttggta tttgttgtag 
gtcttgaact cctgacctct agtgatccac 
caggcatgag ccactgagcc cagcctatta 
tatgaattta ctgattggga tgttttgaac 
tgtaattgag tgacttataa aatgaaacat 
aatccaaagg aagctccaaa gtccttcagc 
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cctagctaac ttttgtattt tttgtggaga 1200 

tcaaattcct gtactccagc gatcctcctg 1260 

gtgtgagtca ccgagccagg tcataaagac 1320 

ctcttttgta cccaggctgg agtacagtgg 1380 

ctcgggttca agcaattctc ctgtcttact 1440 

tgccccacca cgctgcactg attttttata 1500 

ggacaggctg gtctgcaact cctgacctca 1560 

actgggatta cagatgtgag ccactgtgcc 1620 

tacagtgtga ttttagtggg gtggccacat 1680 

aaatgccttc ctgcctttag ccttgacaat 1740 

tggtgtctgg gctaagagtg tgaactctgg 1800 

ctgccactag ctagctctgg agccatgaat 1860 

ctatctgtaa aatgggggta atgaaacatc 1920 

aggttagtag gtctggtgtg gtggctcaca 1980 

gcatgaggat ggcttgagcc caggagtttg 2040 

catctctaca aaaaagaaaa ataatagccg 2100 

actcagaggc tgaggtggag gatcactgga 2160 

ggttgtacca ctgtactcca gtctgggcct 2220 

aggttcgtag acagaaagta cttatggctc 2280 

ttttaaatct tgcatttaat tcctctttgg 2340 

atttgcatgt ggagagatgt tccttctaga 2400 

agtgtgagct ttattaatac tctgtcagca 2460 

acccagtgga atccagcaag agagagaact 2520 

actgagcaac agctgtgcga gacagattta 2580 

ggatgtggcc ttgccacatc tcaaaggagt 2640 

gatttcatga ccacatgcca taaagattct 2700 

ttgtgtgtat gagtgaggga gaaagattaa 2760 

atatagtgtg actggattaa tgacatgata 2820 

atgcatttca gtcaaaaaag taattttgct 2880 

ctaacttacc taggatggat aaacattatc 2940 

ccagggtctt ttctgaatac tggtaacatg 3000 

tgtgattcca ctatatacta cattacataa 3060 

tcaaccttag atttcagaaa taacactgta 3120 

tttctgaagt acatagtaag atctttaaaa 3180 

cgcttgtaat cccagcactt tgggaggccg 3240 

agaccatcct ggctaacacg gtgaaacccc 3300 

gggcgtggtg gcgggcgcct gtagtcctag 3360 

gtgaacctgg gaggtggagc ttgcagtgag 3420 

gcgacagagg gagactctgt ctcaaaaaaa 3480 

agtggttgga ttcattcaat aactttctaa 3540 

agtcttactc tgtcacccag gctggagtgc 3600 

tgcctcccgg gttcaagtga ttctcctgcc 3660 

acacaccact atttctggct aatttttgtg 3720 

cgccaggcta gtctcgaatt cctgacctca 3780 

gctgggatta caagtgggag ccaccgcgcc 3840 

cagggcttaa cgggtgcctt gaaatagtaa 3900 

aatcccagca gttagggagg cggagatgcg 3960 

agcctggcca acatggtgaa accctgtctc 4 020 

gtggtggatg ccagtaatcc cagctactca 4080 

cgggatgagg tggttgcagt gagccgagat 4140 

agcaagactc tgtcttcaaa aagaaaagaa 4200 

ctggtaactt acctaaggag gataaacatt 4260 

ataccagggt ctgttctgaa tactggtaac 4320 

atgttccatg tataaattac ctcttaaaac 4380 

tgagatcggg tctcaaaata ggctggggtg 4 440 

caaacctcca ggctcaagcc atcctccgac 4500 

catgtgccac catgcccagc tagttgttgt 4560 

agacagggtt tcaccatgtt gcccagggtg 4620 

ccagctcgga ctctcaaaat gctgaaatta 4 680 

atttttgatt gccctattca gatttagtgg 4740 

acacagaata acatgtttaa ctaaagaact 4800 

ttttatcttc taggtattat taacccaaag 4860 

ttcgactatt cctactggtc tcatacctca 4920 
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gtgagtaccc tcatgccaca gcactgccag ctcctgcctc ctttcctctt tccttgccga 4 980 

tttgtctttt ctcttagctg agcagatcca agttttgctt cttaactgtt gtgtagcatg 5040 

agttcattct ttgatacagt gatatagtat tttcattcat aaagtatttg ggttgatgga 5100 

aagtttatgt acacatatac ataaatatat acacatatat acatacagaa catatttagt 5160 

ttacaccaaa tttatactga atgtacacta aatttatact gatttagaca tttcttgagg 5220 

atttatttat cagtatctta ggatcctcat taccaacatt attgcccatt gatcttagga 5280 

aatgccaaat aataataatt attattatta ttattatttt cttctgagac ggagtctcac 5340 

tctgtcgccc aggctggagt acagtggtgc catcttggct cactgcaacc tccgctcgag 5400 

gttcaagcga ttctcccgcc tcacctcccg agtagctggg attacaggcg cgtggcacta 5460 

tgcccagcta atttttgtat ttttagtaag acaggatttg accatgtggc cagctggtct 5520 

cgagcttctg accttagtga tccacctgtg tttgcctccc aaagtgctgg gattacaggt 5580 

gtgagccact gtgcctggct ggaaatgcca aattaaaatt gcttagggta ctgttaattt 564 0 

taagttgaat tttgagggaa tggccagcaa gattggaatg accacagaga aattgttttt 5700 

tttttgtttg tttgtttgtt tttggctagg ctgatcttga actcctgacc tcctgatcca 57 60 

cctgcctcga cctctcaaag tgttgggatt aaaggagtga gccaccgtgc ctggccaatt 5820 

tttttttttt aattgtagta gagcaatgtg ggttctgaac attttgaaag atgataccgt 5880 

gcctctacta aaaattccaa aaaaaaaaaa aattaggtgg gcttggtggc gggtgcctgt 594 0 

aatcccagct acttgggagg ctgagggtca tgacctcgcc gtaaccgcat ggcgtcccac 6000 

actgcactcc agcctgggtg atggaacatg actgtgtctc aaaaaaaaaa aaaggaaaga 6060 

ggctgcagat ccacaagaaa ctaaaagatt accagtatta cgaaaatgca gagtagggcc 6120 

cttcctggta cccttactca gtgtggaacc ccaaatgttg ctgcttctta gggattgttc 6180 

atagagttta cttctgcatc attcatctct ttgttacatg tttttggttt tgtttttcct 6240 

gttttgtatt tttttttaaa gactgagtct cgctctgttg cctaggctgg agtgcagtgg 6300 

catgatctca gctcactgca acctacacct cctgggttca agcagttctc ctgcttcagc 6360 

ctcctgagta tctgagatta caggcaggtg ccaccacacc cggctgattt ttttaatttt 6420 

tagtagagat ggggtttcgc tgtgttggcc aggctggtct cgaattccta tcctcaagtg 6480 

atctgcccgc cttggcctcc cagagtgatg ggattacagg catgagccac tgtgccctgc 6540 

cctggggaat cgatgtttta aactattgac aaagcctaat tgtcaaggga agggactgag 6600 

aaggggattg gaaatgctgt ttcctgaaag ctgtgtttat tgagaaatag tcccagtagc 6660 

tggagtgagt tcaaagaagt atttttatat gtggacttga cctttggtct tttattctca 6720 

ttttccactt aagaaaatct tggctgtgtg aatgaaagag tgatactttt aaggttatag 6780 

aaagtgaaat gtaatcatgc cagataattt tatatagata tttttatgta tggctgacct 6840 

ggatgactct aacagtgcat gtgtttgtga gtgtgtgtgt gtgcatgtgc gtgtatttaa 6900 

tgagaaaagt aaacttgtgt ataggaggct taaaaaatgt gtagggaatt ttaggtgact 6960 

gttctgattc cagacacttt tattatggaa gcaatcaagt aagtatagga agaaatatta 7020 

ataaaaggtt atttatttct ccttttactc tttacagccc gaagatccct gttttgcatc 7080 

tcaaaaccgt gtgtacaatg acattggcaa ggaaatgctc ttacacgcct ttgagggata 7140 

taatgtctgt atttttgcst atgggcagac tggtgctgga aaatcttata caatgatggg 7200 

taaacaagaa gaaagccagg ctggcatcat tccacaggtg aaaaacaaaa caaaacaaaa 7260 

atcttctctt cattattagt gttagtctta aattgcttta acagttattt ttatttggcg 7320 

aacatttatg cggggattgt tttatgtcag gcacaaagat gaacaaccca ttattttccc 7380 

tcagaggagc tcacaattga atgggaagga ttgacatgta cacatgtctg tcattaaagg 7440 

tgggaaagtc agtgttttgt aatgattttg ccatatatcc aatgccatat tattttgtca 7500 

tttgaaaagt gttaccagct tgttaaagct ctgttctaag tcctggagat ggggggatat 7560 

tgttgatctg attttttttc aaattccatg catagattac ctctgaagaa tgtgattatt 7620 

tttggttttt ttgatagcct tattcggaga tactttcatt ttattttcct aatttaatac 7680 

tatgacttta aacttcaaac actctgagat tctccctctt tttttttttt tttagaacag 7740 

ttagtattat tgggtttgat cctttattgt ttgggaatga agaagcgttc tgacatagat 7800 

atttatttat ttttattttc attattcgtt tcacttagga tttaatggtg aaagacataa 7860 

gtatttaaaa agaacttcat tatttactta tttatttgag acagggtctc actctgttgc 7920 

ccatgctgga gtgcagtggt gtgatcttgg ctcactgagg ccttgtcctc ccaggctcaa 7 980 

gtgatcctcc tgccttagcc tccagagtag ctgggactaa agccatgtgc caccacacct 8040 

agctcatttt taaattaaaa aaaaaaaaaa tttttttttt tttttgaggt ggagtctctc 8100 

tctgttgccc aggctggagt gcaatggcgc catcttggct cactgcaacc tccgcctcct 8160 

gggttcaagt gattctcctg cctcagcctc ccaagtagct gggacaatag gcacccacca 8220 

ctatgcctgg ctaatttttg tatttttagt agaggcgggg tttcaccatg ttggtcctgg 8280 

tctggatctc ctgacctcag gcaatccgcc cacctccgcc tcccaaagtg cagggattgc 8340 

aggcatgagt caccatgccc ggcctttaaa attttttttt tagagacagg gtctcaccat 8400 

gttgtccagg ctggtctcga atggacctgg gctccatcct gggcttggga tgatctgcct 84 60 

gcctcgactt ccagaagtac tgggattaca ggcatgagcc actgtgcggg gcctaaaaac 8520 

ttaagaaaaa aaaaagatta agtgagaaga ttgtatttaa aattttcctt ttatagtcat 8580 

gcatcactta agggtggggg tacattttga gaaatgcatc attagtcgtt aggtgctttt 8640 

gtcattgtac agatatcata gagtgtactt acacaaaccc tggaaggtgt aaactgctcc 8700 
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acacctagac tgcgttgtat agcctgttgc tccaaggcta taaacttgtg tagcatgcta 8760 
ctgtactgaa tactacaggc aattataata ggatggtaag tatttgtata tctaaacata 8820 
gaaaagtaca ttaaaaatat gatataaaag atttgaaatg gcacacctgc atagtgctct 8880 
tacactggag cttgcagggc tggaagttgc tctgggtgag tcagtgagca agtggtgagt 8940 
aaatgtgaag gcctaggaca ttgctttgca gtattgtaga ctttataaat gctgcacact 9000 
taggctacac taaatttagt tttaaaaaca gtttcctttc aataattaac cttagttaaa 9060 
ttactataac ttttttactt tataaacttt aaaatttttt aaactttctg atccttttat 9120 
aataaccttt agtttaaaac acaaacagcc ccagcatggt ggctcatgcc tgtaatccca 9180 
tcactttggg aggccaaggc gggtggatca ctggaggtca ggagttcgag accagcctga 9240 
ccaacatggt gaaaccccgt gtctactaaa aatacaaaaa aatgagccag gcatggtggc 9300 
gcatgcctgt aatcccagct acttgggagg ctgaggcagg agaattgctt gaacccagga 9360 
gacggaggtt gcagtgagct gagatcgcgc cattgcactc tagcatgagc aacaagagtg 9420 
aactcggtct caaaaaaaaa aaaaagaaac cacacacaca ctttgtacag ctgtacaaaa 9480 
atattttctt tcttcatatc tttattctat aagctttttt gtatttttta attttttttt 9540 
tctttcttga gacagagtct tgctttgtca cccaggctgg agtacagtgg cgccctatca 9600 
gctcactgca acctccgcct cctgggttca attaattctt gtgccttagc ctcctgagta 9660 
gctgggacta caggcacgta ccaccacgtc tggctaattt ttgtattttt agtagagacg 9720 
gggtttcaca atgttggcca ggctggtctt gaactcctgg cctcaagtga tctgcctgcc 9780 
ttggcctccc aaagtgctga gatttacagg cgtgagccac tgcacctggc caaatttttt 984 0 
tttttttttt ttagacagag tttcactcct cttgcccagg ctggaatgca gtggcacgat 9900 
cttggctcac tgcaacctcc gcctcctggg ttcaagcgat tctcctgcct cagcctccag 9960 
aatagctggg actatttttg tatttttagt agagatgggg tttcaccatg ttggccaaga 10020 
tgttctcgat ctcctgacct tgtgatctgc ctgcctcgtc ctcccaaagt gctgggatta 10080 
caggcatgag ccaccttgcc cagcccaaat ttttaatttt tgaaaaaaaa tttaaacttt 10140 
tttgttaaaa acttaggcac acatttatta gcctaggcct acacagggtc aggatcatca 10200 
agatatcact aggtgacagg aattttccag ctttcttata atcttatggg gatcactatc 10260 
atatatgcag tccatcgttg accgaatcat ggttatgcag cgcatgactg actttattca 10320 
aagcattaat cgtatcttga tgtttatgac ataatatatt ttgagatgga gcaagaaagg 10380 
acgtctggct aattcattga gcacgcgtgg atgacttagt actcctctca tttgtgctct 10440 
tcatgcctct ctcattctac ttccctagtt atgtgaagaa ctttttgaga aaatcaatga 10500 
caactgtaat gaagaaatgt cttactctgt agaggtgagt acagccgtga gttgacaccg 10560 
taagcccttg ttttccattc tctcaagcat cacttaaatg gctccaaatt atgactgtgg 10620 
tacacatcac ttcaccattc cttcattttt gtcctttcag gctatttctg ggttttgggt 10680 
ataatgaata gtgtcctgct ttaatctttt ttttgggggg tgcgcggtgg ggacagggtc 10740 
tcactctgtt gcccaggccc tggagtgcag tggcgtgatc acagctcact gcagccttga 10800 
ccttctgggc caaagcgatc ttaccccctc agcttctcaa gtagctggga caacaggtgt 10860 
gtgccaccac cacccctaga taattgattt ttttgtagag atggagtttc cctatgttgc 10920 
ccaggttggt gttgaactct tgggctcaag caatcttgcc tcctccacct ctgaaagtgc 10980 
tggggttgca gtgtgaatca ctgagcccag ccacaattaa atctttgttc ataagtctgt 11040 
gcatttattt gcagttatta tcttttgtat gtttcttggt ttttaaataa tacatttgaa 11100 
ttgccaaatg cctttttgga aagtaatgct gggtaacagt tttcagtcac actaggaggg 11160 
tgtaagcacc cattgtgccc ttgccaacat ttttctattt tttaatcttt gctaatttga 11220 
taatgtaagt attctcttac ttaaattttg aatttctttg attactagta tggttgaacg 11280 
tcttgtaaat tatgcatatg cctgtttaca ttttgggttc acaatgtctt tctgattcat 11340 
ttgtgggaac tcttctcctt taatattgtt aataatggtt tttgacaggc attggtttta 11400 
aatatatata atcaaatctc cccctttccc acttttttaa ttgcttataa aaacttggaa 11460 
agtaattatc catcctgtgt ttaccctggc tctttcatga agtctataga attactttat 11520 
acacatttgc ttatttgata aacattctct ctgttttgta tgtgcttaaa gaagctgttt 11580 
tggggctcat acctgtaatc ccagcacttc gggaggcgaa tggatcactt gaggccagga 11640 
gtttgagacc agcgtggcca acctggcaaa accctgtctc tactaaaaac acaaaaaatt 11700 
agccggctgt ggtggtgcac acctgtaatc ccagctactc gggaggctga ggcacgagaa 11760 
tcgcttgaac ccaggaggca gaggttacag tgagccgaga tcatgccagt gcaccccagc 11820 
ctggagtctg tctcaaaaaa aaaataataa taataaataa ataaaaagaa gaagctgctg 11880 
tttctttctc ttcctccctc tctctctcac cgccaccccc ccaccacacc cttttttggt 11940 
attcactttc atagtaactg aaattattaa agtgttctgt taccatgagt ctgttttgaa 12000 
aaaataggct gccttttaag cgatcttgag attttttttc ttttcttttt ggtatttgtt 12060 
ctgattctct gtagccttac agcatttcat tcctagttaa ttcatttaat tgctttggag 12120 
ttttttcttc acctagaagg tgttgcataa atatgtaaat tctaaggaat tctgtcttgt 12180 
gactgtcata caccaaaaag agaatgtgct gccctaagca tattttgcgt taaaatgaat 12240 
tttctgaata gcttgtccca tagttattct tttttgaatc aggtgataga atttttgagc 12300 
ttacaggtaa taaaagaaga tgcccttttc ttttgtatta gagtttccag gcttaacttg 12360 
gagtggagca tagattccta gttgacactg tggtttcttg ccacatttgc aggacctctc 12420 
tacctgtctc ttctacttcc ttccctgttt actctgctcc acccatattg ccttttttcc 12480 
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tgtttcaaga 
tgcctgaaat 
tgcataaatc 
tgccctttac 
tcatcatgtt 
gtatcccttg 
ggtgtatacc 
tgccaggtgg 
ggattcacat 
ttctctttgt 
cggggattca 
actttctagg 
aatagattaa 
tgtcacccag 
gttcaagcaa 
acacctggct 
ggtctcgaac 
aggcgtgagc 
cttgctctgt 
ctcctgggct 
gccacagtgt 
agttcttaat 
tttatttatt 
ctcagctcac 
agtagctggg 
aagtggggtt 
ccaccttgac 
ctgattttaa 
ttgggaggcc 
gtggtgagac 
gtgcctgtag 
tcgaggctgc 
tacagtctca 
tgaaagaaaa 
tccatagaca 
gatacagaag 
cctagttcag 
tatccttaca 
ttggtttgtt 
cgcaagacta 
ttggtgtgca 
ttaagaaatg 
agaatttatg 
tgccaccacc 
agttagtaac 
tgctctgtaa 
ttcttggagt 
attgataact 
atctgtttac 
attacctgtg 
aaaggttgct 
tatgtgattt 
agagcctcac 
tccacctccc 
ggcatgtgcc 
gttggccaga 
gcctcccaaa 
tctttttcac 
tgtgaaagag 
cacccacttc 
gacattgctg 
gactgaggtc 
tttatagcta 



acataccaga 
gttgttcact 
acacactctc 
cccacatcct 
ctaacatgct 
ctagaagcat 
aattgcctag 
catctggagt 
atcagagcat 
ctcacatttg 
gtattgtatg 
atgacttgct 
ttactttatc 
gctggagtgc 
ttctcctgcc 
aattttctgt 
tcctgacctc 
cactgcgccc 
tacccagact 
cagtgatcgt 
gtggctccca 
tttttaaaat 
tttagagaca 
tgcatcccca 
aacacaggca 
ttaccatgtt 
ctcccaaaga 
aacacaaatt 
aagatgggag 
cctgtctctg 
tcccagttac 
agtgagccat 
aaaaatatat 
aggcaataga 
tttaaagttt 
gatctgtttc 
agccttgttt 
agttaagact 
actttaccat 
tttaatatat 
tggaattaga 
atagcttaat 
aagtgtattg 
agctttcatg 
agtgtggtga 
tattgatatt 
aacctcaatg 
gataaattgc 
ctctccctta 
cgtattgtta 
gatctctaat 
cttttgtttt 
tctgtcaccg 
aggtacaggc 
accacgcctg 
ctggtctcga 
gtgctgggat 
tctaattcac 
tacgagattt 
ttggacccta 
acctcatgga 
tttggcacct 
tgaaagttgc 



aacattcctg 

tggatgtcat 

agtgaggctt 

gagctctctt 

ttctcagtta 

aagctcactg 

aacagtgcct 

ggttggtgca 

aaggaatgga 

tgtgaaaagg 

tacaaattta 

gtcagaaaga 

tttcagagtt 

agcggcgcgg 

tcagcctccc 

atttttagta 

atgatccacc 

ggcctttttt 

ggagtgcagt 

cctgcctcag 

aagtgctaag 

ggattttatt 

ggatctcact 

acctcctggg 

cacactacca 

acccaggttg 

gtggggattc 

aaggccaggt 

gattgcttga 

caaaaaaaaa 

tccagaggct 

gtttgagcca 

aaaaataaaa 

aatggaatta 

aaaaatgcag 

ctgggtagca 

gcttagaagt 

tgagtaaagc 

gagaaataac 

ttctgtggac 

aggagaggag 

gcattaatca 

tctgaggctt 

ctggtcaacc 

gccttaaaga 

gttccttggt 

caaatcccag 

tactaaaata 

cagagaaagt 

atagaataaa 

cttagagctt 

ctttttcttt 

aggctgaagt 

gatttctcct 

gctccttttt 

actcgtagtt 

tacaggcgtg 

tttactaatt 

gctgaatcca 

tgtggaggat 

tgctgggaac 

tttgaggtcc 

tttaattgtg 
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cttcaaggcc 

tttgactaag 

accttgacca 

ccccattcta 

cctatttttt 

gggcagggtt 

gtaagagaac 

gaagtaaaag 

gcacatgaac 

tacgagtgtt 

agtacacagc 

atatttgttg 

gtcttttttt 

tctcggctca 

gagaggctgg 

gacacagggt 

cgccttggcc 

tctttttctt 

ggtgcaaaca 

cctcctgagt 

attacaggca 

tatttattta 

gtcacccagg 

cttgagtgat 

tgcctggcta 

gtctcgaatt 

caggcatgag 

gcggtggctc 

gcccaggagt 

ttttttttaa 

gaggcaggag 

ctgcactcca 

tacagtataa 

agattccatt 

gagctgtaat 

tcgtaatgtg 

ctgattttgt 

tcatttgact 

tattactaag 

agacctcgct 

ttgcttagat 

ccatgatcga 

ctttctccct 

ctcatccctt 

attagcatca 

tttaactcta 

ttactgacaa 

tcttttagct 

ctgaaggctg 

agggcctaga 

ttaataaaag 

ctttctttct 

gcaggggtgt 

gccttagcct 

gtatttttgg 

cgagtgatct 

agccaccgcg 

tgttcatagg 

aaaaacaagg 

ctgtccaagt 

aaagccaggt 

ttttttccca 

gagtcctctg 



tctacattta 

tttctcacgt 

ccttatttaa 

ttttcttttt 

atgtttgttg 

ctttgtctgc 

ggtcctcagt 

aaatgatgat 

ttttacctag 

ccaaaagatg 

aatgattgaa 

ttctgttact 

ttggagacag 

cttcaacctc 

gattacaggc 

ttcaccatgt 

tcccaaaggg 

cttttttttt 

gggttcgctg 

aatttgcacc 

tgagctactg 

tttatttatt 

ctggagtgca 

cttcccacct 

attctttgta 

cctgagctca 

ccaccatgcc 

acacctgtaa 

tcaagaccag 

ttagctaggc 

gattgcttaa 

gcctgggtga 

aaatacagtt 

ccagatactc 

gtaagcttga 

gcctattgtg 

agattcatca 

gatgttatga 

gctagaacac 

gttggaagcc 

acatgtttaa 

atggtcctgc 

cacaaatctt 

ctggaggata 

cccgtctgtt 

atagaaatct 

cagtatacat 

gtcactacat 

gttctaacct 

gttacggcat 

ttaaaaaaat 

ttcttttttt 

gatatcggct 

cctgagtagc 

tagagacggg 

caagtgatcc 

cccggcttct 

tgagctacat 

gtaatttgcg 

tggcagttac 

atggtaggaa 

gttaagggtt 

atcctttgac 



cttctcctct 

tcttcaagtc 

tattgtatct 

tccatcatac 

tttattatca 

tttatttagt 

gagttggatc 

ggctttggat 

attatttggg 

taatatattc 

gaatcatctc 

ttactctcat 

agtctcactc 

cgcctcccag 

tcctgctacc 

tggccaggct 

ctgggattat 

aagacaaggt 

cagccttgac 

acaggcatgt 

catctggcca 

tatttattta 

gtggcgtgat 

cagcctcccg 

tttttggtag 

agcaatctgc 

cagcctttaa 

tccctgcact 

cctgggcaac 

atggtggcac 

gcccaggaag 

cagagtaaga 

agagactgaa 

tgaaaacatg 

atagctccat 

atgaatttac 

agaaaaggac 

aaccaacttg 

tcaagccatt 

tgttggttgt 

ttgccccagg 

caatgagaca 

ggataggtcc 

catcgtccat 

ttgtcttctc 

tatttccgag 

tttaaaaaac 

tccagagata 

attcagtgta 

ttcactttgt 

tttttttctg 

ttattaagac 

cactctaacc 

tgagactaca 

atttcaccat 

gccctccttg 

gtatgtgatt 

ggaaatttac 

tgtgcgtgaa 

ttcctacaca 

atagagtaat 

tgaggccaca 

tgtgctgtaa 
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cagtgagggc 

ttagagctct 

atcatgtata 

aataaaaatt 

tgatggggtg 

agagcaattg 

gattgttatg 

ttgtcacgtg 

caagtagccg 

agaccaacct 

gatggagatt 

ggggggtgct 

accctacacc 

aagaaaaaaa 

agtcttgcag 

ttagctgcct 

agtgggatta 

gttctggtgt 

aaatgtcaca 

gctatattac 

agcctgatag 

tctggcacag 

tttatgttga 

tcaagttagg 

tttagacacc 

tgaaaatgat 

tcccagcact 

tggccagcat 

ggcatgccct 

tggaggttca 

tgtctcaaaa 

aagcactttg 

ggtcaacatg 

gtgccggtaa 

cagaggttgc 

acttcgtctc 

cagaggttat 

cattgtatgt 

ttaatcccat 

ctgaagcttt 

catatttaaa 

ccacaatcca 

tttctctatc 

tatatattaa 

tttttgtctg 

ttatcaatag 

attaattcaa 

gctgatgtga 

aatacttagg 

gcacggtggc 

ggtcaggagt 

aaaaaaaatt 

ggtgggagaa 

gcattccagc 

aaataaatgt 

tgtgattgtt 

ttttatatat 

ttttttgaac 

aaaaattttt 

agctttgaac 

attgcacctg 

gccatattta 

ttctgcctct 



ttcagagtgt 
cattgagttt 
tatgcggctg 
aaatgcttca 
cctttggaat 
tgtaagtgac 
cttataatgt 
gtcaccttta 
ttcccacgct 
ttccactgag 
ttgagaagtc 
cttttactta 
agttttctgt 
caatccagga 
gagaaagttc 
ttggtatgtg 
aatgtttaag 
tcttcaaata 
gactttagcg 
tttgggcaaa 
gattgttaaa 
agtagccact 
atactactct 
tttattgtac 
actgtcacta 
ttttaaaaat 
gtgggaggcc 
ggtgaaaccc 
gtggtcccac 
gtgagccaag 
aaaaaaaaaa 
ggaggctgag 
gtgaaaccca 
tcctagctac 
agttggctga 
aaaaaatata 
aatattagtg 
gaaggtgttt 
aatcatagct 
atttttttac 
gcgtaaaatt 
gataatgata 
ccttcttctg 
tttacatttt 
gagtctttca 
ttccttttca 
ctgttgatgg 
acatttgtgt 
agtaggatga 
tcacgcctgt 
ttgagaccag 
agctgggcgt 
tcgtttgaac 
ctgggtgaca 
caaattgttt 
ccagttgctt 
ttttataaca 
cgtgataatt 
ttttagagac 
tcctgggctc 
ccgtgtgtac 
aaagctctaa 
aaatatttcc 



taagtatctt 
tactcttagg 
tggtagtaat 
ttggtttcta 
tggagagtaa 
aatcttgcaa 
ggagcctgct 
tgtttattta 
gtgtttacga 
aaggtaggag 
ccttttgttg 
atggagggtg 
ataggttctt 
ctttgggtct 
tttcttttct 
tattcttttt 
caaatgatat 
cctttctttg 
ttggattgct 
ctacttaacc 
aagattaagc 
caataatgtt 
atctgcatcc 
tggcaattat 
attattaaaa 
atactcctgc 
aaggcagaca 
agtctctact 
tgctcagagg 
atcgtgccat 
acacactcct 
gcgggtggat 
gtctctacta 
tccagaggct 
ggttgcgcca 
tatatatttc 
ctatttatta 
tttgcccttc 
ttcctggtaa 
agtgttattg 
tggtcagttt 
atttttattc 
tggcaattcc 
ctagaatttt 
cttagcatag 
ttgttgatga 
acttttaaac 
acaagtcttt 
ttgagtctta 
actcccagca 
cctggtcaac 
ggtggtgggc 
ccaggaggtg 
gagcgagact 
tccaaagtga 
aaagcttcac 
tactctagta 
gctatgaatt 
acaagggtgt 
aaaggagctt 
tgtttaaact 
tacttttagt 
tgcttctatc 
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ctcatctaaa 
ttttgatcaa 
tgattctgga 
aagggatagc 
cttgatgggc 
tctgtgaata 
gtccatttca 
ggacagtggc 
ttgttttcac 
agtttcagtc 
ccctctgctt 
ttttatacct 
aaatgaagtt 
caaaaactac 
tcttcttgac 
tttttttttt 
gtttcattcc 
cttattgatt 
tgctttgaat 
tctttgttcc 
gttaatatag 
aattttattg 
ctaagaagga 
aaaagatcct 
tcacctttta 
tgggccgggt 
gatcaagagg 
aaaaatataa 
ctaggcagga 
tgcactccag 
actgggcatg 
cacctgaggt 
aaaatacaaa 
gaggcaggag 
ttgcacttca 
ttatatagat 
gttagtgttt 
tcccatgatg 
ttcattttgg 
aggtattatt 
tgacgtattt 
tcaaaagttg 
cgggcaattc 
atataaatgg 
tgattttgag 
gtattgcgtt 
tgttgccagt 
gtgtggacat 
ttgtagctcc 
ccatgggagg 
atggtgaaac 
gcctgtaagc 
gaggttgcag 
ctgtctcaaa 
ttgtaccatt 
ttttagtaag 
aattatgtca 
ttgtgcatgc 
tatatgttgc 
cctgagtagc 
aactgataga 
tttttccata 
ttgcttattt 



cagtacattg 
gtcatggtra 
ataaatgagc 
tggagtatga 
tcacaaataa 
agtttaagac 
actctcatct 
agctacaaac 
ccagaagaaa 
tctaggcttg 
tttgtggaag 
ctgcttatca 
gtaagataga 
tcaatacaag 
tgaacctttt 
ttgagctatg 
tggaaaataa 
tttcattttc 
tctgattctg 
tcagtttccc 
gttaagtgca 
ttattgttag 
tgcacctgga 
ttgccttata 
tttgttgact 
atggtggctc 
tcaggagttt 
aaaaattggc 
gaatcgcttg 
catgggtgac 
gtggctcaca 
caggagttcg 
aaattagccg 
aatcgcttga 
gcctgggtaa 
gatcaaataa 
atctgaatgc 
tcgaatctac 
atacttttgt 
gaagtataca 
gtatacctgt 
cctcatatct 
cggacctgcc 
aatcatagag 
attcatctat 
atacagatac 
ttttgattat 
ttgtttttgt 
ttaagaaatg 
ctgaggcggg 
cccgtctcta 
tcagctactc 
tgagccaaga 
aaaaaaaaaa 
ttacattccc 
ccatttaatg 
tgtttatcct 
tctttttttt 
caggctggag 
tgggactaca 
tgttgtcttg 
ccttaatctt 
ttgaattgcc 
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tgttctttat caggtcagta aaatcagctt ggtggatcta gcaggaagtg aacgagctga 20100 
ttcaactggt gccaaaggga ctcgattaaa ggtatttatt ttagcaaata aatggcctga 20160 
tcaataaatg gccactacct gtttttgtaa ataaaatttt attgaaacac agctacatac 20220 
atttgtttat gtattgtcta tggctgcttt tgcactacaa tggcagtgtt gtgacagaga 20280 
ccgtatggcc cagaaaacct aaaataattt gtgtctgtcc ttctacggaa aaattttgct 20340 
gactcctgat cngtaacttc aatggaagga taagttatct aactaaatag ttagggagca 204 00 
catcatagtc agggtgagag ttgaattgca ttttaaggaa caggtagaat ttggctagat 204 60 
ggaattaaga gatggcagtg ggaataggat atacaagatt cagatgtgag aacagaaatt 20520 
tcatgtcgat aagttgaggg aagaaggctg cacacatctg taagcttttg gagcacaaat 20580 
atctcgaaga tacttcagga ctgtgagtta ggcatgtggt ctcagcatgt tattatcaag 20640 
tggaaaagag gacagatgga ggactaaagg ttgtttttag aacttgtaat aaagaaatat 20700 
gctagcatat ggcaaatcat attttatgtt ctgtatgata tagcagtagt agaaattctt 20760 
catcattctt tcatatttgg tatttatttt aggaaggagc aaatattaat aagtctctta 20820 
caactttggg caaagtcatt tcagccttgg ccgaggtggt aagttttata cttcattata 20880 
aggggagttg tgtctgttca gcaaatatac ttgatgatat tttattattg tttataggct 20940 
gtctttttgt gcttttctgt aatctgcctt tctcctttta gagcttttca gataagatac 21000 
taggtataac acagtttgac tttgtactga aagtaaaaga tactaaggag aacaaaagta 21060 
aatatagatc tgtaaaaact agagcaaaca ctgaaggttt gactcatttt tttgagcctt 21120 
attttatgtt tctcacaaaa accaagtata tttattgtct cttcttttcc tgtataatct 21180 
tttctgagta agatgcagta tcttttggag tattttgagc agtttggctt tggtgaatgc 21240 
tgttgatttg agggccagta gttttttttt tttttttccc ccaatagttt tgttctttcc 21300 
tactccttta tggtagaaca gtaaaactaa aactattttt tttaaattag gggctttgtt 21360 
gctacattat ttaaattagt cattgtgtgt tctagtttag tttaaaacct ctttttaatt 21420 

gctaaaaaaa tgaaatgctt tcttgcttgc cttggagaaa tcataatcta tgacatgaaa 21480 

tttaaacaaa atctaatttt ctacttggag gcttatcctg tgttcttatt ttctccttta 21540 

aatgctttca cctgtaggat aactgcacta gcaaggtaca gtggggattg gtagagataa 21600 

actagaattg acttttatgt tttaaatcct cactaggatg tatggaggca taagtaggaa 21660 

tggaaccttc aaaaatcttt tcatcatttg ttctctggct ctggaattta gagtggctgt 21720 

aaatttaggg tgaccacaaa tcatgaggtt tcacaccaaa attaatttct agaacatgat 21780 

atgctttgga gaaggaatta agaataaagg aacaaggccg agcatggtgg ctcatgcctg 21840 

taatcccagc actttgggag gccaaggcgg gcggatcatt tgaggtcaga agtttgagac 21900 

cagtctggcc aacatggtga aactctgtct ctactaaaaa tacaaaaatt agctgggtgt 21960 

ggtagcgcgt gcccatagtg tcccaactat tcacgggttt taaactgaag ttttaggcct 22020 

caaggattac atagtggagt gaaagtttgt tccactaatc tgaatgactg aaaggcttaa 22080 

ttctcccttg aggtctgtgc aatctaaaat tgaacctgct attggaaatg actttctgat 22140 

tggtggattg tgtgtctgtg ggtagagtgg aatgagatga tgatgatgtg gacatattta 22200 

ataagagatg gaattaaaat gtttcttgcc tactcccttt cctcctcctc ctcaaaaaaa 22260 

aaaaaaaaaa aaaaaaaaaa aacccaacaa acaccacaag actagccact aagaggccag 22320 

ccatgtagta ccctttttct gaatagaaaa agtgcttgtg caatagtagg atcatcctta 22380 

gtttaacctt ggattatacc aggactgaat tataaaactg ctgatttgct ggagtggtgt 22440 

ttagcagtga gaccctggct ttctccttag tagcttcttt tttttttttt ttttttggag 22500 

accaggtctc tcactgctgc ccaggctggg gtacagtggc acgatctcgg ctcagtgcaa 22560 

cctccacctt ctgggttcaa gcgattgtcc tgcctcagcc tcccaagtag ctgagattac 22620 

aggtgcccgc caccaccccc ggctaatttt ttgtattttt agtagagatg gggtttcatc 22680 

acgttggcca ggctggtctc gaactcctga cctcaagtga tccacccacc tcggcctccc 22740 

aaagtactgg gattacaggc atgagccact gtgcccggcc tccccttaat atcttcttaa 22800 

agtgtcacaa cgcagtagtc agccagcata tatttgagta gctgttacct gtaaagcatt 22860 

tgcattaatt tttactttta gagaatataa ttttttttca tttaagcact gccttttcat 22920 

tggtgctttt tgctttgtcc tgtggtaata agaaggatca tgaagaaagc agaataagta 22980 

gacttaatag aacattcatc tttctaaatt accaaatcat atctctttct tagacctttt 23040 

tacttttagt ctttgcttta ataattggat tatttttggc ataatttcct ttcaaagaga 23100 

ccaaataaat attattttac aagagttact agatttgcaa caacccaacc aatgacatgg 23160 

tggttggcca agtaaattca ctaactagat gacttgctcc tacacatcca acctggaatg 23220 

cacaagatta gtattttccc ctgtgcaaaa ataaaaattg aggatggggt atttgatgat 23280 

gttagggaat tattgtttat ttaatgcaca agattagtat tttcccctgt gcaaaaataa 2334 0 

aaattgagga tggggtattt gatgatgtta gggaattatt gtttattttg gtaagtgtga 23400 

taatggtcct gtggtgtttt gttttgtttt taaaaaacat gaccttttcc ttcacagtaa 234 60 

gaaaaaggat aaaaaatgat atggacagga tattaggtaa tagtattgtg ttgatgttta 23520 

attttttgaa tttggtcatt gttttctgtg taagagaatg ttcttatatg ctgaagtgtt 23580 

taggggtaaa tggagatgag gtccgcaact gacttcaaaa tgggtcagga aaataagttg 2364 0 

tagaaaagtg tttgggggat aattgaaaga tgattggcag agtattttgg taatggttga 23700 

agctgcacgt tgggtatggt agctcattat actgctctct caactgttgt gtatattaca 23760 

cagttttcaa aataaaatgt aaaatgaaaa aattacctta ttgcgtgagt ttttggtttt 23820 
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gtgaagaatt gaaactatga atgataaact tccacagtgt cggggttgta ctgtaattta 23880 
atttacttgg gaattttaga ctgatttgcc tttcttggga atttttttcc ctaacgaaaa 23940 
atgctaagac catttctttt cctttaatag agtaaaaaga agaagaaaac agattttatt 24000 
ccctacaggg attctgtact tacttggctc cttcgagaaa atttaggtat gttgaccact 24060 
agtgaaaagt agttatcttt ttaactttga ttatcttttg tggttaattg tcctgtgtct 24120 
gttttggaga caaataatct ctagatattc atgttactac ttgagtgttc cttgcctgct 24180 
ttcattttta ctcctaattt cagttggttg gattcccttg gttttaagcc agggtatact 24240 
cccatatctc attttcttca gggcatgaac tgtgcagtag ttgtcctccc gactccccat 24300 
acttgcttca gattattgct ctttcttcga agttatttca aaactccatg cttctctgag 24360 
gtgcacaaaa ttagacattg ccaaccctta atcctcctcg ttgctgatgt tttcagacct 24 420 
tcaaatcact gttcttcatt cacttctcct actttttttt tttttttgag atggagttcg 24480 
tgctgttgcc caggctggag tgcggtggtg caatcttggc tcactgcagc ctccgcctcc 24540 
tgggttcaag cagttctcct gcctcagcct cccgagtagc tgggattaca gacacacacc 24600 
accttgccca ggtaattttt ttgtactttt agtagagaca gggttttgcc atgttggcca 24 660 
ggctggtctc gaactcctga cctctggtga tccgcccacc tgggtgaaag agtgagactc 24720 
cctctcaaaa aaaaaataaa agaaagaaat ttgacattac attaaatctt aggatctgag 24780 
attacaatat cagatttgac acattccata aaatttttct tctaaaatga gtgtgatttg 24840 
atactcatga ttaatctttt taggtggcaa ttctcggact gcaatggttg ctgctctgag 24900 
ccccgcggat atcaactacg atgagacttt gagcactctg aggtactttc ttttgatctc 24 960 
agtaacaaca tagaccacat tgccatcaga agccattttg tgataccatg gatgttttta 25020 
tgctatctgg gtagttattt atgtaaataa tgttcctttc ctctcataac agggaaaact 25080 
tagggaaaaa gtgatttgta ttataataga atttacctct ttatggtttt cagtttcact 25140 
gggaaagata tgaacttttt atttttttat ccctcatgga tacctagata aggcaggagc 25200 
tttttcatag gaaagatttt taaattgctc aagtgagatt tttgggtaac ttttaggtaa 25260 
tagtgacatt ttcttattca tttctataaa aagaagtgtc ccccttcttg tagtgtttgt 25320 
cgttatagaa atgctgaagt cggccaggcg cagtggcccc cacctgtaat cccagcactt 25380 
tgggtggccg aggtgggcgg atcacttgag gtcaggagtt tgagaccagc ctggccaaca 25440 
tgatgaaacc ccatctctac taaaaataca aaaatgagtc gggcgtggtg ccatgcacct 25500 
gtaatcctag ctactcagga ggctgaggca ggagaattgc ttgaacctgg gaggcggagg 25560 
ttgcagtgag ccgagatggc gccactgcac tccagcctgg agacagagcg agactccatc 25620 
tcaaaaaaaa aaaaaaataa attctgaagt cataataagt aataaaatta taggaatatg 25680 
tagcaatatt aaaaaattac caacatagcc aggtgcagtg gtgtgcacct gtagtcctag 25740 
ctacttggaa agctgaggca ggaggatcac ttgagcacag gagtttgagg ccagcccagg 25800 
caacgtagtg agactctgtc tcttaaaaaa aaaaaaagaa aaaaacaatt accaacagaa 25860 
tgttcataaa aagttttttt tgtttgttag tttttgtttt tgagtcttgc tatgtggccc 25920 
aagctggagt gcagtggcaa gattgtagct cgctgcagcc tcagacacct gggctcgtgt 25980 
gattctccta cttcagcctc tcaagtagct aaaactatag gtgcatgcca ctaggcctgg 26040 
ctaattttta aaaatttctt gtagagatgg ggggtcttgt taggttactc aggctggttt 26100 
caaattcctg gcctcaagtg atcctcccca cttggcctcc taaagtgttg ggattacagg 26160 
tgtgagctgc ctcccgaggc ctgtattttt ttatgtatga gtaggtttca ttactttagt 26220 
ctggaactct ctacccttct tccatgaaag acacttgatg taatggaata aacactatac 26280 
agtctcataa gttgggttta agtcctctcr ctgttatttc cattgtgatc ttgggcaaat 26340 
cacttaaatt catctgagcc tttttcttca tctttaatag gtactataaa gctttataca 26400 
aatgtattat tatcaaggct cataaaatgt aaacactcag gatatttgat ttagagataa 26460 
tagtatgtta cttatgagaa gtagaacata tgagaaatga caagaacaaa ttttcttttt 26520 
ggatctagat atgcagatcg tgcaaaacaa attaaatgca atgctgttat caatgaggac 26580 
cccaatgcca aactggttcg tgaattaaag gaggaggtga cacggctgaa ggaccttctt 2 6640 
cgtgctcagg gcctgggaga tattattgat agtaagtgaa ttaaggatcg ttacaaaatc 26700 
taatcctttc ttcttcaggg ttcttattca gcgttcttat atttaaaata aacttcaagt 26760 
taaggagcat gatgaagcta aatggtagtg aaaatgtttt gttttgtttt gttttctttt 26820 
tccttaaaaa acccaatgga aatgatctct agattccttt atgtaatgtg taccagacta 26880 
tccatttgaa ctcaggaaat tagttttaaa tcgcaaatat gtcactagta tttttgttag 26940 
aattctgttt tagatcagca atgaagttta tttggaacgt taaacctgtt atcactgagc 27000 
aattaattac ttcatagaaa tgttatttga aattttgagt tttgtttcat gggttgctct 27060 
ggattagtgt cgtggcagta atataagaaa cgcaccattc aagcctacct tcagataaga 27120 
ttcacctttg aatggccccc tcctcctgta aaatgctgga gtgccccgtg ctgcctatga 27180 
acatatgaag gggaaaagtt caattcctat gaagttcaag aactacactt ccagagagta 27240 
ttgttttaca atgttgtatt aaaatttatc ttaaaaaatc tccgtgatgt gcctatccag 27300 
tggcttattc agaagttctc ttaagtgagt aaaatggatt catataggaa tagaatatgg 27360 
tagaatatgg gttagtttag taaagcattt ttttttagtg taaagttaaa cttatattcc 27420 
ttgaagttct aataagattt tcaaggcctc ttctagttta attcttctta ttgattagag 27480 
tccgagttta gatgaatacc gtttgtaaaa atgataccgt attctaagta ttaacttttc 27540 
catctcccca aatgccaaaa actgctctgt tccctctctg tgtcactgct gaacctgaac 27600 
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tttgaccctt ctcgtatttt ccctcctgct taccttcagt tgatccattg atcgatgatt 27660 
actctggaag tggaagcaaa tgtgtgtatt tcacatattg gttatttcca gtactctgac 27720 
ttgctaatct gcctctgctg gatcagcctt aaaataagct ttgcatgctg gcaatcctta 27780 
cagggaaatc ccagacagag catagtatag aatcagaaat gatcctattt ttgcccttct 27840 
tcatttagca aactaatctt tgctgaattt cagtttgtgg cttgagtgaa gttgaaaggc 27 900 
ttaagaatct cttgctgctt ctactcttca gtttttccca ttgtgattca attaaatcca 27960 
aaaatcatag atgatagcaa ctgacactta ctgattttgc ataatatgcc aggccctatt 28020 
ctaggcacct tgaatatatt gtccctttta attctgttaa tggctctgaa aatattttat 28080 
tatgttcctt ttaccaataa tcatttttag tcagcaaaac tatctatgat ttaagagctc 28140 
tgtagatgtt gggctaagac tcatgagtgc ttgtgctaag ttgcttttct ccataacttg 28200 
ggcctcggaa tgattagata aacgtttttt cctttttttt tttttttttt tttttttaga 28260 
cagtctcact cttgtcaccc aagctggagt gtaatgacat gatcgtggcc cactgtaacc 28320 
ttgaactctt gggctcaagt gatctttcca gctcagcctc ctgagtatct gagactacag 28380 
ttgttcgctg ccacacctgg ctaatttttt aattttttat ttgtagaggt gaggggtctc 28440 
gctgtgtttc ccaggctggt ctggaacccc tgacctcaag cagtcctcct gcctcggcct 28500 
tccaaagtgc taggattata gatgtcagcc accacaccca accattcctc ttttcataag 28560 
tcacaaatta aaataactta ggtgtctaaa atgactcaaa catttttaaa gagtcacatg 28620 
aacctcatta ttctctccac atatatctta agttggtgag gaaattattt cttacccttt 28680 
taaggtaatt catcaagaaa ttgttatctt agaagaaaaa tgtcttgatt ttggggctaa 28740 
tttttagatt tgtactgagt tatttctttt tttttctttt ttcttttctt ttttgagatg 28800 
gagtcttgct ttgtcgctag gctggagttc agtggcggaa tctcggctca ctgcaacttc 28860 
cgctgcctgg gttcaagcga tttctcctgc ctcagcctcc cgagtagctg ggactatagg 28920 
tgcgcaccac cacgcccagc ttatttttat atttttaata gagatggggt ttcatcatgt 28980 
tagccaggat ggtctcaatc tcttgacctc gtgatcctcc tgccttggcc tcccaaagtg 29040 
ctggcattac aggtgtgagc catcaagccc agccacctga gttgcttatt tctacagtaa 29100 
ggcatgtaat gaggcttggc agacagtgag aaaatgcata ggagaaacca aacatagccc 29160 
tggctgtggt tatgaggtta gggcttaaag gcactataat gtcctgatac agaagtgact 29220 
cttggactgc aacagctctg tctgagattt ccaaagctca acgatactgg tgtggaaaag 29280 
gtgccttctt ctttttgctt ttgtgctcaa ggtgttgctg ctgccagctt gttgcctacc 29340 
tcttttctta gcagctgcgg catttcctga aagaccatgc atgactaagt tttaattctg 29400 
tattcctgag ggaaagcagc cctgactcct tctcagtttc cattttgatc tatttctctt 294 60 
tattcctttt agggcttgca tgtgaccatt ggacctaaac tctgtccatc ctggtctttt 29520 
ctcccaatgg ctaatttttt gctctaatcc ctatccccct tttaaaagat gcctctttgc 29580 
ttgttgccct atgttctgta tttgcagatg gtggttttca aactgtcttg aagcaaccac 29640 
tggctttaag ggagttaaaa aactggcttt aattaaatag tagagatgag taagcccagg 29700 
gagaactaat agtatttttt tccagctatg tgtattgact catctccttt gaggaaaact 29760 
gacttagttt tgatatgaat tttagtttct caggtttaaa aataagctaa ttttaaagtt 29820 
ggattgcacc aatacttaag atttaccttt ggtatctttt tgcaaaggat gactaatgaa 29880 
agaaagtaat gaaagaaagt acaaacagca aattaattaa ttttgttgtt tgtggcttaa 29940 
tttttttttc ttttaccaat actgagtgta aaagcaaaat taacagttct attaaagact 30000 
tagtaaatgg aagtatgttc ctaaaacttg cattttatac taggaagcag gcttagcagt 30060 
gacccatggt tgtactgtta ggctaattaa gggccttaaa ttagcaaata atcagcctta 30120 
ttttttgctt gctctgacag aagaatggaa cctgcactaa aactcattaa tctgctaatt 30180 
gttcttccct gttgaggcaa tcaattagtc ctgagttttt aaccaggact tatacaaaat 30240 
tcctctcatt tctataatta cccctttttt atttaagaca tattaaaaat ccttttagca 30300 
tgttgaaatg ttgccataca aagcatacta ctcagtgcac aaattttgtt ttaactgtaa 30360 
cattttcttg gaaatttttg ctgaattaca cgtaacactg agccagacca atatggatta 30420 
ttattgataa ataaatgatc attgttttcc cttactgtta caaactactg gaaaagtgta 30480 
aatctgaaga cattgtagac taggaacact agaaagacta aatggaaggt agtttggctt 3054 0 
ttttagggag agtgcctaag cagtatcctt tgaggctctg gtttatcccc ctgcgagata 30600 
cagcgttaac ctacattgga atcttttagt ttgtaaatgc agaaatgatt ctgaacctgt 30660 
aggataccac tattaaagac agacatgttt tagaactgcc atctagaaca gttgtaacct 30720 
gtgtttgtat tatagcgtgt cttggtcttg gcgctttaat gctgtccttt cttacaacga 30780 
tattccttcc tgtctttttt tcttcctctg cttctttccc ttttccctac ttttcctgcc 30840 
ttctcttctt tctatctccc agatctgaaa gattttcaga acaataagca tagatacttg 30900 
ctagcctctg agaatcaacg ccctggccat ttttccacag catccatggg gtccctcact 30960 
tcatccccat cttcctgctc actcagtagt caggtgggct tgacgtctgt gaccagtatt 31020 
caagagagga tcatgtctac acctggagga gaggaagcta ttgaacgttt aaaggtaagt 31080 
aatagttcag actgaataca aggtattcta trtagctcca caaggaagaa ctaggagtaa 3114 0 
aaatcactaa gatttcgact cagcatatgg agaactcttt gactcttaga agtgtcctga 31200 
aaattgaatt ttgtgctttg taagttaatt tcttttcatt agaatgcctg agtctgacat 31260 
ggccaggcag agttgggaga agtgcatggg cagttcatag gctggcaagg cagagtaaca 31320 
gattaataga tgtgtacgtt . aattctggga tagtacatca agttacagtg taattgtttt 31380 
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gttagaaatt 

cttctgttaa 

actttgggag 

acacggtgag 

acctgtagtc 

gagcttgcag 

ccgtctcaaa 

tccaggttta 

tgccctatga 

agggccatag 

aggtatttcc 

tgaaagttaa 

ttcagtgcac 

atcagatgac 

tcacctccag 

gtaggggaca 

tgccagccag 

ggacgataga 

tctttaacac 

ttttcatgtg 

aatcagccta 

atggtggctc 

gctcaggagt 

cagagcaaga 

tagctcacac 

aggagttcaa 

attaagccgg 

cagatcactt 

aactaaaaat 

gagggtgagg 

gcgccattgc 

aaaaatagaa 

gctgaggcag 

ccaatgccct 

aaaataatgt 

aacacctagc 

gtaactttta 

agtggtaaga 

cagggctctt 

ctcccactta 

actatttgct 

ctcttgatgt 

tcacatgtta 

tccaaataat 

ctatgtccca 

ttttaattct 

aaagaaaaca 

tcactgaagc 

gggcagacct 

aatggttgtg 

ctgcaggaga 

acattttatt 

ttttacacat 

actcatgtca 

cccaggctgg 

agcaactctc 

tggctaattt 

aaacttctga 

atgagccacc 

agctctcgtg 

tagcctataa 

agcagcccaa 

aaccttggtg 



tatttggaaa 
taaagatgaa 
gccgaggcgg 
accctgtctc 
ccagctactc 
tgagccgaga 
aaaaaaaaaa 
ttcagatagt 
ttactgtaca 
tgctgcatgt 
cagccccttg 
gagcagcaga 
agacagttgt 
tgtgacagca 
gaagatagtt 
cagggaggac 
cagtcttctt 
gaacagtgac 
taaagtgaat 
aggctgaaaa 
ccttttccct 
acacctgtaa 
ttaaggctgc 
ctctatctca 
ctgtaattcc 
gacccgcttg 
gcttagtggc 
aaggtcagga 
aagaaaatta 
cacaagaatc 
attccagtct 
aaattagccg 
gagaatccct 
ccagcctggg 
gcttgtttct 
aaagggagaa 
ctgttggtaa 
ctccattcac 
ttcagagcca 
ctgagtcctt 
tattaaagcc 
aaaggagaaa 
ggagtcaaga 
ttctagggaa 
ctaaagcaag 
tccttacgcc 
gcttccttag 
ataagcacct 
gcattttaat 
cagtagaaat 
aaaatattat 
tccctttgtt 
gatgttgctg 
caccattgta 
agtgcagtga 
ctgcctcagc 
ttgtattttt 
ctacaggtga 
atgccaggtc 
tttattagtc 
atatccacaa 
gggccacatt 
attcaatggc 



cgtggtttca 
gtttgggccg 
ggtggatcac 
tactaaaaat 
ctgaggctga 
tcgtgccact 
aaaaaaaaaa 
tattaaatta 
ggaaggttat 
atgcagatta 
ctggctctgt 
gccttctgac 
cacctgcagt 
tcatctgctt 
tgacccatac 
aaaaaacact 
cacagtcact 
acattgacac 
taagcttgat 
agattttata 
tgacatttta 
acccagtact 
agtgagctgt 
aaaaaacatg 
agcactttgg 
gccaacagtg 
tcatgcctgt 
gttcaagacc 
gccaagcatg 
acttgaactt 
gggcggcaga 
gccatggtga 
tgaacccagg 
caacagagtg 
tccttcagaa 
ggagtgagaa 
aattgacttt 
acattttctc 
gagtgagaaa 
aatataaaat 
ttagctcwta 
tttctcccta 
cattaaccag 
taacactgtc 
acaaaactgt 
aaaataaaga 
tgttttgtgt 
tagttctgct 
gctgcacgca 
gcttgagaat 
aatggaaaac 
tctttttgac 
tggctttgtt 
aagatgtttt 
cgcgatctcg 
cttctgagaa 
agtagagacg 
tctgcctgcc 
ttcctgtcaa 
tggcatcaca 
atatccatta 
tggaacccac 
ttcaacaaat 
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ttgaaagatg 
ggtgcagtgg 
gaggtcagga 
acaaaaaatt 
ggcaggagaa 
gcattccagc 
aagataaagt 
tatcttctag 
ctcctcctat 
ctggcctatg 
caggttcgat 
tcatagctgc 
gctgaaagct 
tgaagcaaat 
cttttagcca 
ttttaagagt 
tggtattccc 
atttaatagg 
aattgaatat 
tgaagaaaga 
cttacttcaa 
ttgggaggct 
gattgtactg 
ctttagaaac 
gaggccaagg 
aaaccccatc 
aatcccagca 
agcctggcca 
gtggcacacg 
gggaggcgga 
gtgagactct 
cgtgcatcta 
aggcggaggt 
agactccatc 
catgtaagtg 
atctttccat 
gaaattttac 
acaataggcg 
tagggagctc 
gttctttttc 
cacttgagat 
actgccaagt 
actcttcaag 
attattcagg 
attacttcaa 
actcttctgt 
tgtcttgagt 
cttattttgt 
gataacatct 
atctaggaac 
agcaatagag 
tttaaaaaag 
cactgaatag 
ttttttttga 
gctcactgca 
gctgggatta 
gggtttcgcc 
tcggcctccc 
ataaggtttt 
gtcagtactg 
cagaagcagg 
ttaggaagca 
gttctgtgta 



gaattctgtg 
ctcacgcctg 
gatcaagacc 
agccgggcgt 
tggcatgaac 
ctgggggaca 
tagaacattt 
aagttcacag 
gctggtaccc 
gctcccagct 
cttatctctt 
ttctcttcct 
gttcaggaat 
attttttctt 
ttgtgttcaa 
ctgagaagtt 
cttgactgcc 
taaaaggctt 
gtcagatgct 
ggttctgctt 
aagtacagtg 
gaggcaggag 
ctgcactcca 
aaaaataagg 
tgggcggatc 
tctactaaaa 
ctttgggagg 
acatggtgaa 
cctgtaatcc 
ggttgcacag 
gtctcaaaaa 
tagtcccagc 
tacagtgagc 
tcaaaaaaaa 
gaaagtgagt 
taaatgttgt 
taaagacctt 
tcagtgacca 
cttgggaatg 
tttggggatc 
taccctttcc 
gggaccaatt 
attcaggtaa 
gaactctcct 
tcagaaaaca 
tttccctagt 
ttttaatgca 
gaatggtgag 
ttcatccgta 
tgggttccta 
gaggttagtt 
gtcagttgat 
cctttttcca 
gacagagtct 
acctccgcct 
caggcatgta 
atgttggcca 
aaagtgctgg 
aaacatgcag 
tgagtgccgt 
caagggcaaa 
tgctatgttc 
gtaagtgttt 



tagagccatg 

taatcccagc 

atcctggcta 

ggtggcgggc 

ccaggaggcg 

cagtgagact 

gagtagctgt 

aaagccttga 

atggggtcac 

ttgctgttgg 

tgtatgaata 

cctcaattaa 

cctgccaccc 

tccctacttg 

gagtgctagg 

gcagtcatgc 

attcatgcaa 

gccgaaaaac 

ttcattttaa 

tttctttctt 

tgggccgggc 

gatcatttta 

gcctgagtga 

ccgggcacgg 

atctgaggtc 

atacaaaaaa 

ctgaggcagg 

accccatctc 

cagctgcttg 

cctggagatt 

ataaataaat 

cactcgggaa 

taagattgcg 

aacaaaaaca 

gaagttctgc 

aggctgcagg 

agtgaagagg 

tgttgctttg 

agaagtaaaa 

ttaagtctta 

tttgaccttc 

gaaaataatg 

cccagtgtct 

tccgacttcc 

aatttctctg 

tttattctag 

gtgtggtagt 

aattyctttc 

gcgccagggc 

aggcatggtg 

tattttccta 

tggatactgt 

tatgttaggt 

tgctctgttg 

cccaggttca 

ccaccacatc 

ggctggtctc 

gattacaagc 

ctgtagccac 

tgcttaggct 

tgctgtgcta 

agtacagtaa 

ttcatggaaa 



31440 

31500 

31560 

31620 

31680 

31740 

31800 

31860 

31920 

31980 

32040 

32100 

32160 

32220 

32280 

32340 

32400 

32460 

32520 

32580 

32640 

32700 

32760 

32820 

32880 

32940 

33000 

33060 

33120 

33180 

33240 

33300 

33360 

33420 

33480 

33540 

33600 

33660 

33720 

33780 

33840 

33900 

33960 

34020 

34080 

34140 

34200 

34260 

34320 

34380 

34440 

34500 

34560 

34620 

34680 

34740 

34800 

34860 

34920 

34980 

35040 

35100 

35160 
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tatttctgac ttgtcctggt gatttctgtg tttccttaac tgaatcatca tcaagaagag 35220 
attaaattgt ttagctatat gaacataatt tattaaccag tgctatacaa ataactagag 35280 
atgcagacag atrtgacgtg agaccaaaga gtgaatgaca cctgagcctg gcattcctcg 35340 
gagcttaaag aagagttcct ggtgtccctc tgttttttgg aatttttttt ttttcaaaga 35400 
aagggaggag gattaaagat ttatctcctc aacccctcat ttctgaacca tgaatccaca 35460 
tttcaagttc atcttacatt aggcccctag agtggggtat gggaagccct ggacaccagc 35520 
atgcctgcct gccagcagaa tgagttgatg ctcctgagac tgaatgctgt tttgcacttg 35580 
gcttccctat ttatactaag tttgctactc aggcttgaac agaatatcct tcactttttc 35640 
ttaacttagg gtgaactgtt tcatagaaca tttacttgaa agcctattga tcctgtttta 35700 
tttgaagaaa gaacagtgag tcagcaaata gctctagacc tgtctcagtg agtagtgtgg 35760 
gctcacttag cttctctggt ctgttggaac acagtggcac catcttggct cagtgcaacc 35820 
tctgccttcc aggttcaagc gaatctcctg cttcagcctc ccaagtaact gggactacag 35880 
gtgcgtgcca ctgtgcccgg ctaattttcg tatttttagt agagatgggg tttcaccatg 35940 
tgagccagct gatcttgaac tcttgtctct ggtccagttt tcttatctcg tgaatggttc 36000 
cagaataaac gattttgcag gcttttttca gccctgaagt tgtgctaatg tgactcacag 36060 
ttattatttt ctctaagttt cccccaaagt ttttaaaaat agctttctag acaacctaaa 36120 
aattgcagct gaaaactgta ataaaaataa aagttgtacc ttactcatta aatctctttt 36180 
ctgcaaagag atcatgttct gatttaccct tcagttggta agtaaaatta gataagtcat 3624 0 
attagagttg gctgcaaaat ggtaagagga ttcctagaca gtaattttgc tttcttagag 36300 
tgagttctac agttgcccca aaataggatg gctttaatat gactagataa tcctctgctc 36360 
tgaaataacc tttttctgaa ggtaatcaac taggaagaaa tatttttaaa atatctcata 36420 
tgtcagtttg tattccataa tcaacagtgg tttattgtgg ttagtactat aataagtact 36480 
agtcaagatg ccttgtttag gtcgggtgca tcctgttgtc ctgtctgttg aaggttttca 36540 
ctttgagaag attgtataga tttctaccca gctatcttta atagacctta aaattatgta 36600 
ggttatatca tcagcttgtc ttgtatttgc ctatttaaaa tcatatgcag ggtggggctg 36660 
ggaggaaaag aaaaaaaagg aaaaaagcag tcatatgcaa aagcattttc tttttttaaa 36720 
gaggggctca ctgtgttgct caggctggag tgtagtggtg cagtcatagc tcacttatgc 36780 
agaagcgttt aacaccactc ttagtggttg ttgaataact tttgtggtgc tttatagata 36840 
aactattgaa aacttggttc ttggccttct cactccttag ggaattgtgt ggatttgtct 36900 
tgctcgctgg ccctctagta tgacactttt gttctccatt tgtggggcgg catgtgcact 36960 
gcatggtaat catttctgct gaggactgtg tagagatgcc attcaggtag ctgttttagc 37020 
ctttgagcct ctgtgtgtgt tctgaagcct aatgagagcc atttcaggtg gtttttgtca 37080 
cttgccaccc ccaactgcag acttgttcta gcttcttcca ccttgtaagt ataactcttt 37140 
tctgaaaaag tacctcctac atggctgcct tactgatgcc caggttcctt ctttaatctg 37200 
ttttgctctt tgcctacctt tgctggttgt tgttgttctt catctttctc ttttttttaa 37260 
attttatttt aaatcatacc ctcctatttg ttccaccaaa ctcatatact actttgcttt 37320 
tgcagctgct gctttgaatc tttactcttg cttttttcac gtgttccctt gccttgcctt 37380 
tggtatgttt tttgttttga ggtgacatcc agttcatgta atattctgga agtccaagag 37440 
atccagaaag tcaaacacta gacaacttaa ggtttattac tcacaagtcc tttctgtgta 37500 
gatattttac gttttggggt cacaggcttc gttagtaacc tgttatcata tgaatcatac 37560 
attgaagtgt ttgatgaact ataacacttt tccttgttaa tcactactag catcattact 37620 
atatttcatt atcatcatct gactttctta cttcatctcc atgaccagca ttttcacctg 37680 
ttttggcatg tgctcgtccc ccttttttgt actccttttc cttccattct ttgcctaatt 37740 
gctacttgtc cttcatttct ctgtctgggc ctcactcagc cttgattcac ccatgttaca 37800 
gagacctgtg aatactccag ttctagctct tagtacactg ttctcttatc tcctactgta 37860 
ttcgttctca gtcctcagac atgctgctct agcaacacat tattcctaga aaactccagg 37920 
tactttccca ctctcagcat acccacatgg cagcttctgc tcttctttga tctctgctca 37980 
catatcactg agtctgtgag gcctgtcctg acaacctcac tttagattgc tccctcacct 38040 
acccttcccc atgtgttctt tcctcttgtt gctttatttt tgtctggagc acttgccact 38100 
atctgacata ctgtgtactt attgattaaa atggattata tcatggcggg gcacggtggc 38160 
tcacacctgt aatcccagca ctttgggagg ctgaagcggg ccgatcacct gaggtcagga 38220 
gttcaagacc agcctgacca atatggtgaa acctcatatc tactaaaaat acaaaaactt 38280 
gccggacata gtggtgtgca cctgcagtgc cagctactca ggaggctggg acaggagaat 38340 
cgcttgaacc tgggaggtgg aagttgcagt gagccgagat cgcaccactg tacgctagcc 38400 
tggatgacag aatgagactc tgtctcaaac agacaaacaa acaaacaaac aaaataaata 38460 
ataggattat atcagtctcc ttactacaat gtaaatctta tgaaggcaga gattttgttc 38520 
tctgacatat ccataggtgc cttgaagaag gcttgatatg taatggggtt gtatataagt 38580 
attgaatgaa taaatcatca cttacttctc tctctctctc tctggtatac agtaacctcc 38640 
ttgagagcag gaaacatgtt ttcgttcttt ttaaattaaa ttaaatttaa tttaattaat 38700 
ttattttttt aattttactt tacgttccag gatgcatgtg cagaatgtgt aggcttgtta 38760 
cataggaata catgtgccat ggtggtttgc tgcccctatc aacccgttat ctgggtttta 38820 
aaccctgtat gcattaggtt cttatattta tataagtgta aacactcagc aagtgattgg 38880 
taaattgtca cttcagtgtg tgtcttcaga tgcgatttaa tagtaaacct ttctatgttg 38940 
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actgctgggt 
agtaaatacg 
gttgctcagg 
ttcatgccat 
tgcccacgtg 
cgtctctact 
tgctcgggag 
gagatcgcgc 
ttggaggatt 
cttatgtatt 
gcatgatttg 
ttaactgtgc 
tgggaagaga 
ctggaaatgt 
atataaatta 
acagggattt 
atctagggag 
tggaacatat 
aggtgggcag 
atctctacta 
gctactcggg 
gccgagatgg 
aaaaaaggaa 
ttgactatcc 
agcaagattt 
tgagataagc 
gctttgcagg 
gtgttctgat 
attcgggaag 
gtgaaaccta 
ctttcaggac 
gatttcatct 
ttgtttactg 
gtttgtttag 
tttcagtttt 
taactttttg 
gcttctgcaa 
ccttctagac 
cctttttaca 
agggtttttt 
actatcaaat 
tacaaagcat 
ggctattgga 
gatacaaggg 
ttttcgctac 
ttttcttccc 
agatgcatat 
gtggttttca 
tacttaggat 
catatgtcta 
tttttttttt 
gtgatcttgg 
tcccaagtag 
tagagacggt 
ccacttcagc 
atttttatta 
aacaaatgtt 
ttttcttact 
gggatatacc 
ttcaaattat 
agtcaagttc 
tttctcttta 
ttcttccaca 



tgtgttcagg 
tttgggaact 
ctgcagtgca 
tctccttcct 
gatcacgagg 
aaaaatacaa 
gctgaggcag 
cactgcagcc 
tatttgtttg 
aataaaagaa 
cttcttgctt 
gcttccttcc 
agcttcgtaa 
ttctaagttt 
aaatcaaaga 
atggccagtt 
tcacaggttt 
tggccaggcg 
atcacgaggt 
aaaatacaaa 
aggctgaggc 
tgccactgca 
tagcctcttc 
caaagtcagt 
cattgatttt 
acattatttt 
ctttattttc 
atacctgttt 
atggaggaac 
atcagacaga 
ctactctccc 
tagagcatgg 
tttacaaaag 
tcatctttat 
ggctactctt 
gaaccaattt 
tttgtattat 
tcctctgctt 
taatcatagt 
tgtttttttc 
ggttattttg 
gtattttatg 
gacttaaaca 
gtggtgagtt 
aattcttctg 
aaattccaat 
tgtagaatag 
cagttttaat 
tatatagatt 
ataaaggtat 
ttttttttga 
ctcactgcaa 
ctggactgca 
ttcaccatgt 
ctctcgaagt 
taacaactag 
gagatgtaca 
ttaattttac 
agtgaccaca 
gatgtgcttt 
ttcaactaca 
accatctttt 
tttgctggca 



aggttagtgt 
ttattctttt 
gtggcgcgat 
cagcctcctg 
tcaagagctt 
aaattaactg 
gagaatcact 
tggtgacaaa 
gggttgattt 
aaatctcttt 
ttctagcatg 
ttaggaatca 
aacagaggcc 
tctaggggat 
gtctgccctt 
gaatttatgc 
attagttacc 
cggtggctca 
cgggagatcg 
acaaattagc 
aggagaatgg 
ctgcagcctg 
ttttcttttt 
tgagacaaag 
gaatgtgatg 
atgtttgcat 
caattcatat 
ttttcctaga 
cctaggggtt 
gacacttttt 
ccttattatc 
tcttatagta 
aaaaaaccta 
ttgttgcttt 
tcatgtagat 
cagctgtaca 
agacagccat 
ccttcttcat 
tatgaagtag 
ttaaactaag 
tgtgcttatg 
ccactagtaa 
tctttattca 
ttgaattggt 
gtcaggtttt 
gtgttatttt 
aacagttaag 
agtttttcct 
ttagcagtta 
tttataacat 
gacagagtct 
cctccgcctc 
ggcacatgcc 
tggccaaggg 
gctgggatta 
tgaaaatatt 
ctgcatggaa 
ccatcttgct 
tccaaactca 
aaagaatgct 
aagggggatg 
atagttcctt 
ctctcttgag 



gttatggcga 
ttttcttttt 
ctcggctcgc 
agtagctggg 
gagaccatcc 
ggcgtggtgg 
tgatccccag 
ggaagactcc 
agttttaggt 
acatttaatt 
gtatgttaaa 
gagaagatca 
atcagaatgg 
gtggattact 
ctaaaacaca 
tttgaaatga 
atattacatg 
cgcctgtaat 
agaccatcct 
cgggcatggt 
cgtgaacctg 
ggtgacagca 
tatattaata 
ttcttatctc 
ttgcttttag 
tattggctat 
tagactgatt 
gaggctttgt 
ttctcaccta 
gtttgtcttt 
ttgatacttg 
caaaggcagc 
aggcaatatg 
ctcttaaatt 
tttcccacag 
ggtttgtgga 
tgtactaagt 
tttttaaaaa 
ttacgtagct 
ttgcaacgtc 
tatcagaaaa 
attgtccttt 
taggttctgt 
acttttatgt 
tctatcctgg 
caacagaaaa 
tattttatat 
tcagtaacac 
cagttttggt 
cttacagtat 
cactctgtcg 
ccgggttcaa 
accacgccca 
tggtcttgat 
caggcatgag 
atgcatttaa 
ttccaaagtg 
ttagctttct 
gtcttaggac 
actcacaaag 
ctagttgaga 
ttcctacccc 
tagattttgg 



aattgtagtg 
ttgagatgga 
tgcaagctct 
attacaggcg 
tggccaacat 
cacgtgcctg 
aggtgaggtt 
gtctcaaaaa 
ctaatcttta 
ctggcctgta 
ttataagact 
ttgctgagtt 
agaggtcagg 
ttatcgtaag 
aaacgaaacc 
catctacata 
ttgctaggga 
gccagcactt 
ggctagcatg 

ggtgggcacc 

ggaggcagag 
agactctgtc 
tgcatatcat 
agtgaatata 
aaataacttc 
aaacatggcc 
taattgacat 
tggctgagat 
aaaaggtagg 
gttactgggg 
ccttaatttc 
ctatttctaa 
taacctttgg 
cccagcctaa 
aagtcgtcta 
gaataatact 
aaagcaattc 
gtgtctccaa 
tttagctaat 
ttgttggata 
taggagaaca 
atttcaagtg 
tgatgatctc 
gatatgaaat 
ccacataact 
gactgatgta 
gtttatatta 
atgtattcat 
ggaatttaat 
taaagactgt 
ccaggctgga 
gtgattctcc 
gctaagtttt 
ctcttgactt 
ccactgtgcc 
cataaatcat 
agatctctgc 
agacttcttc 
ctgtttctga 
ggctttgact 
agatgtggag 
ctcatctgac 
cctcaagacc 



aaacaaaagc 
gttttgctct 
gcctcccggg 
cctgccacca 
ggtgaaaccc 
tagtcccagc 
gcagtgatct 
aaaaaaagac 
tgattcttat 
tcctcaaagg 
ctttgcctct 
gaatgaaact 
aggttaaaat 
aaaagataaa 
aaaaaaatgg 
tttaaatatt 
ttcataaaaa 
tgggaggctg 
gtgaaacccc 
tgtagtccca 
tttgcagtga 
tcaaaaaaaa 
ttatgaatta 
actttaacaa 
tcattaatga 
agttgttttt 
tttggtatta 
gggagttgcc 
aaacaatgct 
cacttcatgt 
agatatttct 
aatgcctatc 
ccatgattca 
cctagcgtac 
ccctgcttta 
gcttccattt 
ctgtccctat 
tagttacctt 
tagagtagta 
ctagaaaatg 
tatatcgatg 
acctcacatt 
tctaagttag 
attcctgtgg 
tttttttttt 
aagacatttt 
tgaaaggtta 
tcagatatgt 
tgtgcctatg 
gaggaaattt 
ctgcagtggt 
tgcctcagcc 
gtatttttag 
tgtgatccac 
tggctggaaa 
aaataacttg 
ttccaaagct 
atgaaaggaa 
gattccattt 
ttctcaaacc 
gtttttcaga 
tcttctttca 
atgtgactta 
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tttttggatt ccagtgggtt attttcacta aagaaggttg gttttttaaa aagattttag 42780 
aaatattaga gttagatgaa gaggattgtt agaagatgtt tatcttatcg aatatcagta 4 2840 
attttacttt tttctttaag gtaaagacta ataaattaaa atcttttaag gtaaagacaa 42900 
attaataact gggatcagaa gaactgataa aaattatagt gggaagtggt gggggggtat 42960 
ggggacattt taataaaagt aaatatttct tgagattttt acccaggcat ttacaaaaat 43020 
tataattatt ccgttcagag aattggcctg tcattttttc cttgtgcttt gaccttatca 4 3080 
aaaattaggt cttgtttttt tatcccagaa ctcatgtttt aaaatggaat gccaggcaca. 4 3140 
gtggctcact cctatagtcc caacactttg ggaggccaag gagggtgggt cagttgaggt 43200 
cagtagttcg agaccagcct ggccaacatg gtgaaaccct gcctccacta aaaatagaaa 4 3260 
aattagccag gcgtggtggc gaacacctgt aatcccagct acttgagagg ctgaggcagg 4 3320 
agaatcactt gaacctggaa ggcagaggtt gcagtgagtc aaaatcatgc cgctgtactc 4 3380 
tagacagagc gagactctgt ctcaaattaa ttaattaatt aaaataaaat agaatgcact 43440 
tggttccgtt tggtgaggcc tgaagagagt gttttctctt tattatggaa ccgcaactag 4 3500 
ggataaaaag aaatccctct ctaggctctg tggagagttg gagaagcctg gcctcttgta 43560 
ggcccctgtt gttaccacag ctctctcatg gtgccctgct ccaaattgat catctgtaat 43620 
ctttttcaga ccccacatct tgttaacctc aatgaagacc cactaatgtc tgagtgccta 43680 
ctttattaca tcaaagatgg aattacaagg tatatttatt tcctgttttg gtcacttcgt 43740 
gtgttttccc cctcttagat aattgaataa ctaaagggaa ggggttgaaa aaattaacgt 4 3800 
aatgatttgc tgtatttttt gtctgaaata gttacaaact atgctctctt tccaaataat 43860 
gtgtttttgc cactggagcc agttactatg tagtttttct ctgaagaccc taaataattt 43920 
ttttttcctt taacaaatat acattcctta gggattttat ttgactcatg tctttataat 43980 
actgtatgag ttacattggt atatcagtgc tccttgtttc ttattctgat gtagagaggt 44040 
tgatcatatg tgaggataga attcatagtt gaaagcttct gatgcaacaa ggcaatacct 4 4100 
tttttgtacg taccaaagga tattctttgg aataaagagg ttcattgttt gtagtgcttt 44160 
ctgtgtctga atttccctgg gaaacacttt ctcttgtgtt cagggttggc caagcagatg 44220 
ctgagcggcg ccaggacata gtgctgagcg gggctcacat taaagaagag cattgtatct 4 4280 
tccggagtga gagaagcaac agcggggaag gtgagcattc ctggctggag cttcagcaac 44340 
aacattttca ttttatatta tgagaaatcc ttaagacttt gtattctctg tctatcagta 44400 
gtactttctt atacaatcta attctgaaaa atggagagac ctgggctgct tatgaatgca 444 60 
gagatggaca aggctgcttt acatgaaaat agcttggaca aaagaagccc ttttttactg 44520 
ccaagaactg agaaggacat aggcaattag gcttggtctg gaatgttaat tatttaatag 44580 
aaaagtaaga aaatagcaga tatcctgggt aataggagat ttgaaggaca ttaagtcaac 44640 
ccagcagaat ttatttttat ctaaaaggga agaaaaagtc agtcttgatt tttgcctggg 44700 
ttattaacaa aacaacaatt taatggtttt tttctgttat ataagtcact cattccctta 44760 
tcaaaatatt agcttctcag tctttagttt ctggttatta cctatatctc atccttacaa 44820 
tttctgatgg ttctgagttt tattgactga accgtcagag atccctgaaa ctaatatttc 44880 
ctatcatctt cttaggttta tcaaatagag ttaaatgttc ttgtgttagc catggccaca 44940 
gatagcctct tctattgggg ctagttctgg taccccaaaa tgaactactg tatagacaac 45000 
ttcagccact tgattgattg cagggattta ttctacttac tgcaaatcct gataagcaac 45060 
tgctttccat tatttgattc caatagtttg taatgataac attagtttgt gtttgttcct 45120 
cttagttatc gtgaccttag agccctgtga gcgctcagaa acctacgtaa atggcaagag 45180 
ggtgtcccag cctgttcagc tgcgctcagg tgagactggg agaggtttgc catcttcagc 45240 
aatgtgcaca tggcttctgt gacaactcta atttttggct gtttaaaggc tgaagtaata 4 5300 
gtcagcatta ggatttttgt tcttgtaaaa acaacagctc tgaaagctgt cttttcacat 4 5360 
tagggagggg tgaggttgtt aaatagatac tatatattaa aaaaattatt tcttaaccct 45420 
atttttctgt tttgtgctag gaaaccgtat catcatgggt aaaaaccatg ttttccgctt 45480 
taaccacccg gaacaagcac gagctgagcg agagaagact ccttctgctg agaccccctc 45540 
tgagcctgtg gactggacat ttgcccagag ggagcttctg gaaaaacaag gaattgatat 4 5600 
gaaacaagag atggagaaaa ggtaatgcac agttacgcag cccatatgac tgtttcttct 4 5660 
tttaaacatg taatactaat agcattcttg aatttttttt ttttttttta cttctaggct 45720 
acaggaaatg gagatcttat acaaaaagga gaaggaagaa gcagatcttc ttttggagca 45780 
gcagagactg gtaggagtcc tgaatctgct aaactgttgg gaaaagggca gcttgttccc 45840 
atactttccc tgttccacag agcagtactc acccaaattg cttctgtctc artgatacca 45900 
agcactattc tttaatttcc ttaatggaga atgaacttaa atctcccggt agccttagcc 45960 
tgaaaaaata gtccacagag gtactctttt gggcttttta tgtcttaaag ccaaatctta 4 6020 
acttctgtta caaccaaata ctttttaagg aaatagagct tttctggtag cctttgcctc 4 6080 
ttgatagtgg ttttggaatt tgcttcagtg gtggttcttt aaatgataat tactctgaat 4 6140 
attgaatttg gtgagagttt gccttggttt tgtttctgat cacttgatag tactaattct 4 6200 
ctgctcttgg gctgactttg ggattgttct tacgctgggc agactttttt tttttaagtt 4 6260 
aaactgtgtc taaaagtgtt gctgcacagt tgcatgtgtt actcctttcc ttatcccctg 4 6320 
catggagtct gaattctcaa tcaggttctc agtggcatgt gtggtagcgg tgggagcaaa 4 6380 
ggctgcatac ccagcccgga caggacagaa ggcttgcttc ctttgaggga aggaggattt 4 6440 
gagtgagcag ctggaaagtc tgtttaaggt cccagctatt gatacaatac taatggcttc 46500 
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agcgttttat aatggtaatt gatttcccag atttgataga aactgaatct catgaataaa 4 6560 
taagttgatc cattagttag attttcattt ttaaagccaa ccaaacaacc aattttggtt 4 6620 
aacaaagctg tggttataag aaccttctca gcattcattt gatatgtaag aataaagtat 4 6680 
aactaacctt tggatactgt ggatgcagtt taggaagttc tcatttttta agattcagtt 46740 
gtatctttct tttaaggtaa gggggcatta gggagaaagt tcaagcactg gcttctttcc 4 6800 
ttagtataaa atacagcatg taaactacag ccttctagtc atgctctgtg ctttggagaa 46860 
gaaggggctt agtgctaagt gggccagaaa cgtgaaggac ttgagcacaa gcaacctttt 46920 
ccagaatacc tcagttctca ggcatgggca tggattaaac cagttagttt gttatcttgt 46980 
aaattatagt aaagccctaa aaagaatgga ttgtaagaat aaatcaaggg tttggggcaa 4 7040 
gataatagca gtgttaccaa cagaggctct ttaggaggcc tttttagcat taagttttaa 4 7100 
tgaaaacgaa agaactagct acatgtgatt agtggcttga aaggttttgt gacacaaaaa 47160 
tgtcttttat cttaataatg tagtacattc taataacatg cctctgatgg tctgtgagag 47220 
atcaaatgag gacaggtgta attaagctct attattaaca ataatgcctt gtatttgtat 4 7280 
agtgctttat aatttacaga gagcgttaac atccatcaat ttatttgacc ctcaaaacaa 47340 
ccctctttgt ttacttatgc atatgtataa gtaaaatatg taatattacc tccagttgag 47400 
gaaaccaaaa cttggctggg cgcggtgggt cacgcctgta atcccagcac tttgggaagc 4 74 60 
ctaggcaggt ggatcctggg acctcactta gaataactaa gcacggtgct cgcttcggca 47520 
gcacatatac taaaattgga acgatacaga gaagattagc atggcccctg cgcaaggatg 47580 
acacgcaaat tcgtgaagcg ttccatattt aaaaaaaaaa aaaaagaaac aaaaaccaca 47640 
aaaacaaaac aaaacaagac aaacaaaaaa aaagaataac taagcacggc agaataagga 47700 
tgcctggtgc tatccataac caagagttgg tgatcattct gtgccctaaa ataaagatgg 47760 
ccactaaata aagagaaagt gaatgtctaa gacgtattta gtttctaggg aatgtacata 4 7820 
ccctgcagat aatcagattc tggttgttgt ttgctctgtg tatctgaaag caagccagga 47880 
tttcaagagc agcttcttaa agcaaggaag ttgctttcct ctcaaaggtc tgtctgttcc 47940 
actttcattt cttgacttaa gggaggaact gatttctaac acttcagcct gaagaatatc 48000 
taccagtagt aatactgaat ggaaaatgtc tttatatatt accctgtatt ggtcaattaa 48060 
cagatgtatt tctaatgcac cctgtgaata aacattgaga gaatacaaag gaattattaa 48120 
tcagtaaata tttatttaat atttaccata tgtaatcact ttttgcataa tgtcagcatt 48180 
aaacaatgga attcaggctc atgagttttc tattctgata gttcccaaat gcatcacatt 48240 
gttagggtgc tgagtatttt cttcagttag tgctctagga taccctggtc ctttctgtgc 48300 
ttgcttttta cacacagttt tgggatcttc tctcttctta tttgcataag ttccttgcag 48360 
agtctcttta ggaggcctac tttgcatttg gcatttaatg aaagccgaag gatcattttg 48420 
tgacatgtga attgaccaga cctcagtagc ttttccaact ccttaataaa tgaatctaaa 48480 
tattttaaaa atctggccag gcgcggtggc tcacgcctgt aatcccagca ctttgggagg 4854 0 
ccagggctgg cagatcactt gaggtcagga gttcgagacc agcctggcca acatagtgaa 48600 
accccatctc tactgaaaat acaaaaatta gcaggacgtc ttggtgggca cctgtaatcc 48660 
cagctgcttg ggaggctgag gcaggagaat tgcttaaacc tagcaggcgg aggttgcagt 48720 
gagccaagat cgcgcaactg cactcaagcc tgggtgataa agctagactc agtgtcaaat 48780 
aataataata ataataataa taataataat atagataaat ataaaaaatt ctttttttga 48840 
catggagata ggatttatat cacataaact ttgcatacag tttttagtat caccatcaat 48900 
atgtgacagg gtgatacatt tgaactttaa agaggacata ctctacagag tcaaaatggc 48960 
agtattttaa aaagaaagaa aatgacagct gctagatttt acttgagttg agttcttttg 49020 
gatttatata gacttattta tattttattt tagagtttaa agataacttt tgtctcttaa 49080 
gcatgatatt aaaacaattc ttagaatact tgtcatttcc tgaattaggt aatattcatt 49140 
tcatccttaa gcaatttgaa gaggaaaaga aaaacagcca gaaaaaaagt gatagttgtt 4 9200 
ggttatgtgg gaaagaatca aagataattt gtttatgttc tgtaaaattc agtttataca 4 9260 
tatttgcaaa aaatcctctt caggagtcag acatagtttt ttctctaata tttatgaatg 4 9320 
tcatatttat cacttaaaat tagccattat gattttaaaa actaaaaatg aaaagaaaaa 49380 
ttaatgtaga gtctgtgtaa atttagcaaa cttttgttgg taagttatca aataggtggg 4 9440 
agttataaat gggcttagta ctcatttgta ggtcaataga aataatacaa atcagtggca 4 9500 
agatttttaa attcagaggc caaattatta ttattattgt ggaaagttac ttttttgttc 49560 
cttagaaagt gactatacag gcaaatcttt ggttttgtca ggtgtgatga tgtgtgcctg 4 9620 
tagtcccggc taaggcagga agatcacttg agcgcaagtg tttgagacca gtttgggtcc 4 9680 
ctgtctcaaa aaacaaagca aacaaagaaa caaacaaaaa caacaaactt tggtctagga 4 9740 
ggcaaggtac tatttatcag tttcaagcaa acaaactctt ctagagatat ttttggtaac 4 9800 
cacagtgaat aaaagattga ctgttagtaa ctattttcat ttactagact gccctgaact 4 9860 
actcatctgt gagttgagta tttttttttt taagtgaaag catgtttatt aagaaaataa 4 9920 
aggaataggc tgggtgcggt ggctcacccc tgtaatccca gcactttggg aggccaaggt 4 9980 
gggtggatca cctgaagtca ggagttcaag accagccggg ccaacatggt gaaaccccat 5004 0 
ctctaccaaa aatgcaaaaa ttaggtaggc gtggtggcag gtgcctgtag tcccagctac 50100 
tcgggaggct gaggcaggag aattacttga acccaagagg tggaggttgc agtgagccga 50160 
gatcgggcac tccagcctgg gcgacagagc aagattctgt ctcaaaaaaa aaagaaacaa 50220 
agaaagaaag taaagcaata agaatggcta cttcataggc agagcagctg agtattttct 50280 
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tttcccagca 
attctaaatc 
taagagcgat 
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aagacagagt 
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aagtttcaac 
taaagcatca 
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taaattaatc 
aatttgataa 
aataacatgc 
taaatatgat 
catagctttg 
actcatataa 
gaatctataa 
tgtatcattt 
ataccaacaa 



acttttatct 
caacattgct 
tgcttattat 
ggctaagtaa 
taaacatacc 
tgtgtttatt 
gggttttcta 
ttaaattata 
tctcaaaatt 
acacccacta 
aatcgtacga 
ggaaagatga 
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ataatgtaag 
tgaatgggga 
ggtttaagaa 
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acagaaactc 
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ggggtaaagg 
aagatcacat 
gttggagaag 
cctccgcgga 
aggcaaagcg 
agttagaggg 
gccagtttgt 
gtcgagaaac 
agacagtgtt 
acattataaa 
acctcccttt 
agaactcctt 
acagacttgt 
tgacatccag 
gcagaattaa 
aactgggtgg 
tagaattttc 
catcttccac 
tggttttttc 
attggcactg 
gacacacatg 
aaaacttgtg 



cccgaataga 
ttcccagaga 
ctatacagcc 
ggttttctga 
ttttaaagtg 
ggaagagttg 
gatagaaaat 
ttcagtggaa 
ttggaggtga 
tgaggatcct 
cggaaatgga 
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tgctcatagg 
attctcatct 
gcgtcctaca 
tctgtgctca 
atgtggggaa 
ttttcttcga 
ggtgacatac 
ttaagccaga 
agaaatgtac 
ttttgttttg 
ttgctttttc 
attttttgat 
gaggtcgtgt 
caagttgcaa 
aattaaaatg 
aatctcagat 
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tgccatgtat 
cgtctgggat 
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atgtctgtta 
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tgtctgggat 
cacacaaagt 
tagcttgtaa 
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aattctttct 

tccatgttat 

cttgaaatta 
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ttcgatttaa 

tttgtttttc 

gaagagagct 

accattgtta 
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cttaaaattc 
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gatgtagatg 

aagcaaaata 
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aatcccactt 

caagatttgc 

cattgttgtg 

atggtgcaaa 

atgtaaccag 

tgatattcct 

tgtacttatt 

ccagctttat 

gatttgtgtg 

agctttcttg 

atgacaaaaa 
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ttctagggct 

cctttgttaa 
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gggttgttag 

cgttaagtca 

tgtcaattct 
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gtcttaatag 

tttttggtct 

ttttcctttt 
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ggaaactgat 

aaaaatgtgg 

cccaaagaag 

aggctgtcaa 

agattgaagc 

accccaatga 

ttggggatga 

acctcaaggt 

acatgaaaga 

tgccactgat 

ttggtgctgg 
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gtctgcgctg 

caaagcagct 

cccgcttccc 

tcatcataac 

aagggaataa 

ttccatgggg 

ttaataatca 

agccgaaaag 

gtcaccccac 

agtcttactg 

agaagcagac 

gacagttctc 

ggacaaactt 

tctacctttg 
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gcagtttctg 

ttttagtgcc 

ttatgttttt 
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ttagaaattg 

tccacgtctg 
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cggatctcag 
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tcgtcggcag 

ctttaagagc 

agaagatgag 

agaagagagc 

ctctcaagga 

gcaacagcca 

tacgcgctgc 

tgctgatgta 

taattataac 

tgacaaaccc 

agcacccaat 

gaaatcatgg 

gccttgagtt 

acactggtaa 
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taattaatat 

ataatgccta 
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gggacagaca tgggggacaa agctacagca tatggctctg tggtgctgct tttttccaat 54120 
agttgatgaa atagtgaaat aaaagttaac tttggttggg cttggaataa aggagatgat 54180 
acataataaa ctattctttg gttagcatta gtaatgcttg tagacactca aacaaggaag 5424 0 
gcgaagtcct gatgatgatt tattgtatca atacctcatt ttatttggtt gtggtctcag 54300 
tttgcctcac tggagaatcc actaagaaag atggatttta atgggaagaa aagaattatt 54360 
ttctacatcg ttcctcatct cttgagtttg tcaaaaagtt ttctaagagg cagtgtgaag 54420 
agccagagat gtccgaaatt gcctgacttc atgttttgga agattcctaa gtcacatgaa 54480 
atcatctatc aaacacktgt cttccaaacg gtgtgaaaaa ctacagccaa gatagcagtt 54540 
agcattcaag gaagccttaa agattgtgat tatgtttttt tttcctttgc gtgtgggcac 54 600 
tatgctttat tagaacacat tatttttaat catagtattt tttctttgcc tctaagaaaa 54 660 
tactttctta atgctcagaa agttgcttcc aattaatgtt tttttcccct taaaaagaga 54720 
agctttgaga gatatttttg ctttcatagc tagaacagtt gaagtcttca actgaggttt 54780 
tatagcagat tagacatggg taaatgatgt ctgtaatggg ttgagttact gagatgacaa 54840 
tctcctgtgc catttggttt gaatgtactt gataggctgc ttcaaatcag tcacttcaat 54 900 
gcattttgtg taaacccagt tgtccttttt tattcctctt tagacataaa tgtgctaact 54 960 
attcctttct atacacaatt tattttctaa gattaaaaat aataattgct aggtttggtg 55020 
gcacatgcgt acagccccag ctacttagga ggctgaagca ggaggatccc tagagcccag 55080 
gagttctggg ctgtggtaca ctatgccagt tgggtgtaag tttggcatca atatggtgat 55140 

gtccctgaag caggaaacca caaacttgca tcaggacggg tgaacccagc ccaggttgga 55200 

aatggagcag gccaaaactc ccaggctgat cagtaatggg atcttgccta tgaatagcca 55260 

gtgcactgca gccaggacaa cagagtgaga acctgtcttt aaaaaaaaaa aaaaaaaaaa 55320 

aaaaaggtaa tatttattat attattcagg tttcaactct gtagcaaaaa tgggctctca 55380 

tttccctaac ttgagataac atagggtagg tccatatatt ttcattctta cagtggtctt 55440 

ttcatgggag tgaatgagtt actctccact ggtgattagg taatactgta gaatgaagag 55500 

ttgtataata tattcattta cagctgtgga ttgtggtaag gactatgtcc acagtgatat 55560 

tccaaagaat tgggtttata tttgtgcttc atctgttaat cccaggtgtc ctcatgttgc 55620 

tgaaatattt agatagctaa aatatccctt aatttcacag atgaccagga agaaattaac 55680 

caaggtttta ttgactgcca tgtatgtccc atgatgcatt tctgagcaaa tgcttatcct 55740 

agagaataac tctgtatgaa taaaattgct taattgagtc tcttactaaa taagtaacta 55800 

gtgccatgct tttgtgagct cttggtatgg cccatattac tttgtttttt gtttttgtta 55860 

ttgttgtttt gtgatagtct tgctctgtcg cccaggctgc agtgcagtgg cacaatctca 55920 

gctcactgca acctctgcct cctgggttca agcaattctc ctgtctcagc ctcctgggta 55980 

gctgggacta caggtgcatg ccaccatgcc tggctaactt ttgtattttt agtagagaca 56040 

gggtttcacc acgttggtca ggctggtctc gaattcctaa cctcaggtga tccacctgcc 56100 

ttggcctccc aaagtgctga gattacaggc gtgagccacc gcgcctggcc tgtttgtttt 56160 

tttaacatga tttttctcta agcttaaata ccacaaggcc aaagagaaat ggtcataatt 56220 

taaaccatta ttatattgtt gaggtatccc tagctattat tatagcaaag tgggaaaaaa 56280 

gtgtttattc tattgaagtt atgtaatgat ccgacattaa tgggaatata gaggagtcct 56340 

aattaattgg tataatttca caaagcggaa tggtattcct tggagagtta aagacattct 56400 

ctttagtaag tgtaaaccaa agggcatttt ctttattcct gcttctaatt ccttctagcc 564 60 

cagtgaatta tttctctttt cactggtaat gtgatgaatg ggaattgttt attacattga 56520 

agtgacttga agtgaccttt tgtgctttag gtgcaggttg acactgaaaa aaaaacaaaa 56580 

crctgaattt ttcacaccta tgtctgcatt aaaggctgtt ttactaccgg aagttacata 5664 0 

gacttcctgc agtcagctga tgtgccccag tgccttactg gtcctttgta gatttgcctt 56700 

aatgatttgt acaaatgact gggaggcggg gatgctgcct gtgtcctggt gaaccttaat 56760 

gaaggggccg tcttaggcac agtgcaaaac aagcatttgt cctgtactgt tagagccaaa 56820 

attgtgatga gcaatactga taattgtcca gtttatgtca tctttcccag attttaaaat 56880 

ctgttctaga tattcttagc ttgaaccact tttgattgtg aaatgtatta ggtgttgtcc 56940 

cattattact gtaaaatgaa gttttgaatc ttcttgttaa taaactgtgg atttcccctc 57000 

tcaatttctt aaacaacaac aaaaaaatgc ttgaagattg tctttgagtg taagatctgc 57060 

cttttcagaa agggagtgtt agtttgtaat gttaaaaaat aaagacctca ttcaataaaa 57120 

gttgaagtca tcttttaaga gtgtgatttc tctctatgtg ggaagaggga aaaggaaagc 57180 

aaggtaatgc taactaaacc tggttttgac ttttatttat ttgctttttc atctgaagat 57240 

ggttgtaaac taaatcttct ttttgatatt ttctatgtga acttgattag ttttaaagct 57300 

tttgctttag ctaccctttt ctgtttaaca aaattctatc ttaagtgaga tctttcagcc 57360 

ctattttatg ccacatttca cttacataaa atcttttctc acaaaagatg aataaggata 57420 

ttactttctg atttgagttt agtttaacca tgttttcagt tttttctcta catactcttg 57480 

gtttgaagtt cttttaaatg aaatgattta ttttccttta tatgggcaga aactaaattt 57540 

ctaaaaagtt ttgagtgact ctgcaggttt tctttctcct ttaacctctg cttttcttca 57600 

cccagagtat tgatggtaca ctagagagga aacaggacaa agcagaatat ttttctggaa 57660 

tggaccacta gtctcattga aaatttggac ctgacattat tttttaatgt ttaaaaacaa 57720 

cataatcacg catcatggga tagaatacaa aatccacatg agtttttaat atgaaatctg 57780 

aaactttaga gaagtgttat gcctgaatcg gtctgcttcc agcctcatac ccttttcctt 57840 
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caggtacttt gtaatccttt gctgtcggag aaactgttgg aaaagttgca atgtccctac 57 900 
cgtttatagg gtgacacaga agattacagt gtgatatgtg tagtgacact gtgctgatga 57960 
gactgtggcc acacagtgct gctggtggtg agaagcgaag ttaagtgttg tcaacaactg 58020 
gggtgtcatc ttgggtgtgg atattttcct tccagctatt gcttgctaat tataatctgc 58080 
cattaggagc caagccttta atgtctaaaa cagcctaacc ggaaaactca gaatggtgat 58140 
attgatttta ctgtgaatat caccttctca gaggtgcctt aattgtacca ccatcatctc 58200 
tcatcgagag taccctagct tgtctctttg cttctattct tgacccctat aatccaatgt 58260 
ccacatagca accaagatca tcttttaaaa acataaatca tatcttgtca gtgtcctcag 58320 
gggctttcca ttgcatgaaa gaatcttagt cttactgtag tctccaaggc tttacatgtt 58380 
ttggctgctt acctaccact cttaacactg tcactctcac accacagaca ctctgccctt 58440 
ttttttgttc cttgtatatg ccaagtttct tcccacctct gggctcttgc acttctcaag 58500 
cctggagccc atttccatca tgcttacctc cccttccaga cttgagctcc tcagctcttt 58560 
ctgcccagcc caccacaagt tgactcactg ttttaataat gatggcctgt aattatcttt 58620 
tctttgtatt tgtttggttc ttccacgaga atgtaaactc acctagaata aatgccgtat 58680 
ctgtcttatt cacttcgcca gtcatataaa tccataacct ggaagagtgg ttctcaaact 58740 
ttaatgtcag ttagaatttt ggggtcttat gagaacacag actattgggc ccaccttcag 58800 
agtttctgat tcactaggtc tggggcgaga cctgataaat ttgcatacct cataaattcc 58860 
caggtgattc tgatgctgct ggtccagtaa acacctttga taaccagtga tctggaatgg 58920 
gggctctggt aagcattttc tgtaaaacac tagatagtaa atattttaga cttagtggac 58980 
catgtggtct tttttgtgaa aactgttcaa ctctattgtt gtagtacgaa agcagccata 5904 0 
gacaatatgg aaatgaacaa gggtggccgt gttccagtca aactgtgttt acagaaacag 59100 
gcagtgggct ggatctagcc cgaaggccat agtttgccga ccccgattag aagatagcta 59160 
ctttgataaa tatttatcta aaagatcagt atgtgactgt gtatgtaatc tagaagacag 59220 
atacatgcca gttgctggga tgagtggtaa gtgacaaatg tgtggtccct ggcctcataa 59280 

agttctcaga gttatggagg gagacagctg taataaccac accagtagtg agtcagtaca 5934 0 

tttgtgacaa gtgccatgat ggaaaaacac aaggtgctct gaaagggttt aacgggtgcc 59400 

tttgggaatg aaaagtggcc tctcattgct cttagctgac aatgtttttg tttcactgag 594 60 

gaaaaaaagc aatcagaaga aactttgttt tcttcctttg tctagttttg cctgcatcat 59520 

ccatccatac ctgtactggg ctttccctcc tgttagtgga taaactgtcc ttgatcctct 59580 

cctagcccag cccttgctcc ctgcattcca tgccctctca agaacttgtt gaacatttac 59640 

ccccttctct tttacatcat cagtttctct cctggattat tcctgccagc atgcaaacat 59700 

gtcaccagtt gcaaaagttt tctgtactta ttacgtccta cttgcctacc tccattcttt 59760 

tttttttttt ttttgagata gagtctcact ctgtcaccca ggctggagtg cagtggcacg 59820 

atctgagctt actgcaacct ctaccttccg ggttcaagta attctcctgc ctcctgagta 59880 

gctgggacta caggtgaacg ccaccacacc gagaattttt gtatttttag tagagacgga 59940 

cggggtctca ccatgttggc taggctggtc ttgaactcct gacctcaggt gatccacctg 60000 

tctcagcctc ccaaagtgct gggattacgg gtgtgagcca ccgctccagg ccccattttt 60060 

tcttttcatt atttcctatc ctcattcttt gctggaactc atctagtcag actcttgttc 60120 

tcccctcacc accaaagctg ctcttatcag agtcacaaaa gctcttcttg atgatagatt 60180 

cagtgcattt tccagtcctc atcttactga atgtctctgg catttgacac agttgtctgc 60240 

tcttcatggg acactttctt ctcttggctt cttttccttg aacctttcca gctgctccat 60300 

catggtctgt ggacacagac cttggctatt tctatrtgtt tctctgttcc cccttccatt 60360 

tttgtactga tgatgaagat aatgataact catatattga gcatgtatta catgtattaa 60420 

tacacataaa ccttttacat gttttgactc attgagtcct cacaacacyc ctcttctgtg 60480 

aggtaggtat tgcttattcc cttttcacat actttcccag tgtaacacgg ctactcagcg 60540 

ggagagccag agccagtccc agggagtcat tctgtctgca aacctctgcc ctgcctcgcc 60600 

ttctgctgac tctcaggtgt cctctctctc cagcctgggc ccctaccctg agctctagat 60660 

gctgatgata cacaactact gcctgacatc caaatcactg gcaggcacct tacatttaaa 60720 

tcatttgaac tggagcagct agctcaggcc acaaatccaa gtcaccttta gtttctcttt 60780 

ttcttcacat cccatgatca gatcatcagt agatcctgtt tgctctccct gtaaaacata 60840 

tcctgactcc atctcttttc accacctttg ctgctaataa cctggtccaa gccaccacag 60900 

tgctggacag tgctgtgaca gcctcatgac cattcttctt atttccatag ttcctcctga 60960 

gtcccacaga gaataatgtt tttaaaacat atcccagctc taaaccatcc agcagtttcc 61020 

attagaataa tatccccaga ccttaccatg gtctataagg ccatctgtac tccatttgct 61080 

acctgattac atctgatgca tctcctaccc gtctccctac tctgccattc catccacaca 61140 

agttttcttg cttttcctga aatatgtggt gcccctttcc atttcaggac cttggcactt 61200 

gctgttacta tgaatttgag tagaataacc cctgccaaag acctctgcct gcttaaccta 61260 

ctcagtctag aagtctgtct cccattcttc tccctggtag tctctaaccc tatctttatc 61320 

tttcttcata ctagtactcg ccactaaaat ccatctatat tttgtttact ctttatcatc 61380 

tctgtctcct ttgtcaatgg aacccttgtc ttattgacca ctagcatttg gcacatagta 614 4 0 

ggagcacaaa aggtatttgc tgagtggatg gatggataat ggtttaaaaa tcatgctctg 61500 

gattgcctgg gctcaagtct catttttagc atttgcttaa cctctttctt tctctttaac 61560 

atgaggctaa ctatatagta ccatctttat ttttttgttt tgttttgttt tttgtttttt 61620 
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tattgagaca 
ctgcaacctc 
gactacaggc 
tttgccatgg 
cctcccaaag 
taaggttatt 
aaataagctt 
cagtgctgtg 
ttttttttga 
gctcactgca 
gctgggacta 
ggggtttcac 
tggcctccca 
ttataaacat 
tttcactctt 
gcctcctggg 
tgccaccatg 
aggctggtct 
catgagccac 
tttgtcacaa 
ggaaatggtg 

gggggccaat 

taaagttttt 
ccaggctgga 
gtgatcctcc 
ggctaatttt 
aactcctgag 
tgtgccacca 
tcatgttagt 
cttaactgtc 
attcttgaag 
gtaaagcagc 
actccccagc 
tttgtatacc 
agtttacccc 
tttttttttt 
ctcagctcac 
agtagctggg 
tggggtttca 
cctcggcctc 
tctcctaata 
tatatcatgt 
tgttttaatc 
ctttcattat 
cctcagtatt 
agtggctcac 
tcaggagttc 
aatttaactg 
gagaattgct 
ccagcctggg 
ggggccaggc 
gatcacttga 
gcctgggtga 
gatgttgtgt 
atcaataatg 
gtaaatatat 
tatcctgttt 
tgttgttgtt 
gatctcggct 
tctagtagct 
tagacatggg 
tgcctgcctc 
attttagtat 



gagtcttgct 

tgctgccctg 

acacgccacc 

tgaccagact 

ttctgggatt 

gagaagattg 

aaaaaccggg 

cctttataaa 

gatggagtct 

agctccgcct 

caggtgccca 

cgtgttagcc 

aagtgttgag 

aaatgtctac 

gttgcccagg 

ttcaagcagt 

cctagctaat 

cgaactccta 

cgcgtccagc 

aatgaacaga 

atgatataca 

tcaatagatg 

tgtttgtttg 

gtggagtggc 

caccttagcg 

tgtgttttta 

ctcaagtgat 

cacctggccc 

tttggttact 

agagtctgag 

aaatcctcaa 

agctttctct 

tttttgagac 

catacttata 

aaatacttcc 

ttttaagacg 

tgcaaactac 

actacaggca 

ccatgttggc 

ccaaagtgct 

ccactatccc 

acagtccatt 

caagatccag 

actggtattt 

agattcaagt 

gcctctaatc 

aagaccagcc 

ggcatggtgg 

tgaacctggg 

tgacagagca 

acgttggctc 

gcctgagagg 

cagagggaga 

acttcctgtt 

ttaaagtttg 

catttccctt 

cccatgcatc 

tttaagatgg 

cactgcaccc 

gggactacag 

gtttggccat 

agcctcccaa 

ttattatcct 



ctgtcaccta 
gttcaagcga 
acgccctact 
ggtttcaaac 
acaggcatga 
taataattca 
gtagataggt 
tatggggcat 
cgctctgtcg 
cctgggttca 
ccaccacacc 
aggatggtct 
attacaggcg 
tgaaaaaatt 
ccggagtgca 
tctcctgctc 
ttttgtatca 
actcacccgc 
caaaatatat 
acgatagtat 
gtgtcctttt 
atgttgatat 
tttttgttgt 
acagtcttgg 
tcccgagtaa 
gcagaaacag 
ccacccgcct 
agttaaactt 
aaagcacagc 
aattttgtat 
gataaagatt 
ttttgctata 
atttgaaaca 
tatttttcct 
atgcttatct 
gagtctcact 
acctcccggg 
cctgctatca 
caggctggtc 
gggattacag 
atctaggaaa 
ttcaaatttc 
tcaaggttta 
ttgaattgtc 
taaatctttt 
ccagcgcttt 
tggccaacat 
tacacacctg 
aggcagaggt 
aaactccatc 
actcttgtag 
tcaaggttat 
ccctttctcc 
gcatcacatc 
gtaaatttgg 
cttaaataga 
ttttactcaa 
ggtctcttgc 
tccgcctccc 
gtgtgtgcca 
gttgcccagg 
agtgctggga 
ttcttgattc 
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ggctggagtg 
ttcttgtggc 
tatttttttg 
tcccgacctc 
gccaccacgc 
tgtttaacat 
aggagagaac 
attttgaact 
cccaggctgg 
caccattctc 
cggctaattt 
tgatctcctg 
tgagccaccg 
attattatta 
gtggcgagat 
agccttccga 
ttagtagaga 
ctcggcctcc 
tattttttaa 
gtgaatctca 
taaaagctcg 
tttgactatc 
tgtttttgag 
ctcactgcaa 
ctgggactac 
ggtctcacta 
ctgcctccca 
tttagattca 
cacagctgtt 
tcattggtag 
aacatttttt 
ataattcatg 
gagaagctga 
gaattatgtg 
cttaaaatca 
ctgttgccca 
ttgaagtgat 
tgcccagcta 
ttgaactcct 
gcatgagcca 
tttaatactg 
gttaattaaa 
tatatttcat 
taggccacat 
tgttaagaat 
gggaggccga 
ggtaaaaccc 
tagtcccagc 
tatagtgagc 
tcaaaaaaaa 
tcccagctac 
agtgagctat 
aaaaaaaagg 
agtagacata 
ttaatgtgat 
tatctgggca 
ttattttagc 
tctgtagccc 
gggctcaagt 
tcacacctgg 
ccagtctgga 
ttataggcat 
agttattaca 



tgtgacacga 
tcagcctccc 
tatttttagt 
aggtgatccg 
ctagcctagt 
taagaacact 
attctagtag 
agttaggatt 
agtgcagtgg 
ctgcctcagg 
tttgtagttt 
acctcgtgat 
tgcttggcca 
ttattattac 
cttggctcac 
gtagctggga 
tggggtttca 
caaagtgctg 
tgtgccttcc 
ttaattagcc 
taagtagcct 
aattgtggtc 
atggggtctc 
cctccacttc 
atatgccctc 
tgttgcccgg 
gagtgctggg 
tagggctatg 
agggaaggtc 
tttgaccagc 
ttctacctga 
taatgtatag 
aaaaaaaact 
aaaattatag 
agaatattct 
ggctagagca 
tctcccgcct 
attttgtgtt 
gacctcaggt 
ctgcatccag 
attcaatatt 
tcaaaaatta 
ttggttatga 
tctgatttgt 
gctataaagg 
gacgggtgga 
cctctctatt 
tacttgggag 
tgagattgtg 
aaaaaaaaaa 
ttgggaggct 
gattgtacca 
aaaaagattg 
aaatgtcagt 
gtttcccagc 
tggactttgg 
tttttgttgt 
aggctggagt 
gatcctccta 
ataatttttg 
attcctgggc 
tgagtcactg 
tctggggttg 



tcttggctca 
aagtagctga 
agagatgagt 
tccaccttgg 
accatcttta 
tcctgataca 
agagaacagc 
tttttttttt 
cgcaatctcg 
ctcccgagta 
tagtagagac 
ctgcccacct 
ttaggatttc 
tgagacagag 
tgcaacctct 
ttacggcatg 
ccagtcagcc 
ggattgcagg 
gtaattagtt 
atttgttgtg 
ggtttgggga 
aaaatgcagt 
gctgtgtcct 
ctgggctcaa 
cactaagtcc 
actggtctca 
attacaggca 
ggaaatcttt 
atcctcatac 
ctgtcaactt 
tgatttttag 
tttttccccg 
ataatgaact 
atatcatggc 
tttttttttt 
atggtgcaat 
cagcctccca 
tttgtagaga 
gatctgcctg 
ccaggaatat 
gattcaatat 
ggaaagagag 
ttttttaaat 
ctgattattt 
ggccgggtgc 
tcacctgagg 
aaaaatacaa 
gctgaggcag 
ccactgcact 
aagctatgaa 
gaggcaggag 
ttgcacccca 
ctataatggt 
ttgtcctatt 
tctctgcatt 
atcgtatgca 
ttgttgctat 
acagtggcac 
cctcagcccc 
tattcttttg 
tcaagcagtt 
tgcttggcct 
caatatggtg 



61680 
61740 
61800 
61860 
61920 
61980 
62040 
62100 
62160 
62220 
62280 
62340 
62400 
62460 
62520 
62580 
62640 
62700 
62760 
62820 
62880 
62940 
63000 
63060 
63120 
63180 
63240 
63300 
63360 
63420 
63480 
63540 
63600 
63660 
63720 
63780 
63840 
63900 
63960 
64020 
64080 
64140 
64200 
64260 
64320 
64380 
64440 
64500 
64560 
64 620 
64 680 
64740 
64800 
64860 
64920 
64980 
65040 
65100 
65160 
65220 
65280 
65340 
65400 
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attttccaat tgccatttct 
tcagatggga tcttgctgtg 
cgcactacag ccttcaactc 
cacctgtaat cccagcactt 
agaaaaagtg ccaggacccc 
ggtgcatgcc ccatgcctgg 
tttccccttt ttctctcctt 
caatgtactg tgctattttt 
aatggcccca gtggtgtttg 
ccttgtactt tccttgcttc 
ctttcagtgg cagtggtact 
accttttata aaaaaacatt 
cactttggga ggccgaggtg 
acatagtgaa accctgtctc 
acctgtagtc ccagctactt 
gaggttgtgg tgagccgaga 
ccacctcaaa aaaaaaaaaa 
ctttcacgtg tttttaaaat 
gcaggtcaaa tgctaaaatc 
aactttctca gaattagaag 
ttaccagttt aatttttttt 
ttagatatct agtaaagcta 
aagagaacac ttcctttgtg 
catccttctg agcattgctt 
gcagtgcttt atgatgcatg 
agtggatagt catacttata 
ggaaccaaaa gaagcagggc 
aggccaaggt gggcagatcg 
aaacccatct ctactaaaat 
cagctactga ggaagctgag 
gagccaagat tgcgccattg 
aaaaagaagg aggaaaccct 
gtattgcgct gtttgaaggt 
aagtttttca aattaacaac 
caccatgaga gtttaatacc 
ttcatatatt aaaagactaa 
ggggtgatat tacccttatt 
tgatttgtat tacaataatc 
ccaggctgga gtgcagtggc 
gtgattctcc tgcctcatcc 
agctaatttt tgtattttta 
aactcccgac ctcaggtgat 
gagccactgc acccggtcag 
acttgaataa ctaaatgcag 
ggttacctga aggctaaact 
agcatcaaag tctacaatct 
ttgatgatac tggtgaattt 
tataagatat tacttgaata 
aaattccccc tttctcctac 
agaaacatac tgcagtgagt 
tgctttgtgg gtttttttcc 
ctttgtctct tttccttgtc 
ctttccctgc cctcttattt 
gactatgaga gtaaattgca 
gaaacaactg aagaggagga 
tgtgtatatc tttttgaagt 
attcctctta gttggcccta 
gaatagatgt gcagtgttga 
ggtgttctta gaccttcgca 
acctgcactg tctctcatat 
agctgaatct ttgttaagac 
catattgctt agtcatgttt 
gctttcatta ttacttattg 



tctgcattta 
ttgcccaggc 
ctgggctcca 
tgggagataa 
aaacccaagc 
taattagcta 
ctttcagtat 
ctgttgattt 
agttcttcca 
agacctggaa 
tagaaaccaa 
gcctgaggcc 
ggcggatcac 
tactaaaaat 
gggaggctaa 
tcgcatcact 
attgcctgac 
ggttcatctc 
tttatttaaa 
gctagtaaga 
tctttttttt 
tgtagttggt 
agaataactg 
gggagctcct 
gaatgtttga 
ccctttgtct 
tgggtgcggt 
tgaggtcagg 
ataaaaatta 
gcaggagaat 
cactccagcc 
ggcagtaaat 
ctctttctta 
ctaaggaaac 
tcagatatac 
gatttttttc 
gaaaaacgca 
cttttttttt 
gcgatctctg 
tcccaagtag 
gtagagacag 
cacccacctc 
taatccatct 
tctataaatt 
catcctactg 
taattttttt 
aatctttttc 
tattctaacg 
cactacccca 
tttttctttt 
taaacatttc 
ctttgctttc 
cttccctctc 
ggccttgcag 
agaagaggaa 
tatattatca 
tcatttattg 
gaatccaggc 
cggtttctga 
tgctgtagat 
agtctgttca 
gcctatctta 
ccttataatg 
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ttagccacct 
tggagtggct 
gggatcccag 
tttaactttc 
ctctttagcc 
gcttttaaaa 
caccatggaa 
attggccagt 
tcctttctga 
gcagctattt 
gatctaggtg 
aggcgcactg 
gaggtaagga 
acaaaaaatt 
ggcaggagaa 
gcactccaac 
attttgatat 
ttggtccttc 
gatgaagaag 
acaagtgcca 
cataacatgt 
tcgtggtctt 
acaaacgacc 
tttgtaatgg 
tgtgggactt 
catgacagcc 
ggctcacgcc 
agttcgagat 
gtcgggcatg 
cgcttgaacc 
tggacaataa 
cttaaaatgc 
accctactta 
tttttagaca 
tacatatctg 
atcctctgag 
tggagtagat 
ttttttttga 
ttcacttcaa 
ctgggattat 
ggtttctcca 
ggcctcccaa 
tatagttgca 
atttcagtcc 
atcaaagata 
tccttccaaa 
aatattgttt 
cttaaggtat 
gatgaagaac 
taacatttac 
cctgtatgtc 
tctacttccc 
ctacatgtta 
aagcaggttg 
gaaggtgaaa 
aattagtcat 
acattttact 
tgtattttat 
gcacgttcat 
gtgactttta 
atttgaagcc 
ggttttccag 
cagtatgaac 



tttttttgtt 
attcacaggc 
gggctgggca 
tcagaattag 
tcgagtagct 
aatttttatt 
tcatgaattc 
gggatcctgt 
cacaaaatat 
ttttcaagga 
ctaagtatgc 
gctcacatct 
gttcgagacc 
agccaggcac 
tcgcttgaac 
ctgggtgaca 
tgtattacca 
ctctagtact 
gtgaagttta 
ggaccccaaa 
agtctcctaa 
ggtcactgcc 
agaaccatag 
agtttgctaa 
ggagagctct 
tgtaattggg 
tgtaatccca 
cagcctggcc 
gtggcgggca 
cgggaggtgg 
tagagtgaga 
tatgcttatg 
ttttattttc 
tttgtcccct 
tatactgatt 
aaccattttt 
ggaaaaatga 
gacagagtct 
cctctgcctc 
aggcacgtgc 
tgttggtcag 
agtgctggga 
tattgaaaat 
taatagcact 
gtaatttttc 
gagttagata 
ttctctggaa 
acacgtgtta 
tcttgagtct 
ccattacaga 
tttttattat 
ccaatattcc 
tctcctttct 
aaacccgatc 
tctagagacc 
ttatgcataa 
gagccagttt 
gaggtgggga 
ttaaatccct 
gatatccatt 
atttctggca 
catttcttag 
acttaccaca 



tttttgtttt 
ctgatcatag 
tggtggctca 
aaggctagta 
gaaatgaaca 
taaagaagag 
tttctatatt 
gtccttttga 
ttcagactca 
accctggttc 
ttaaacaata 
ataatcccag 
agcctggcca 
ggtggcgtgc 
ctgggaggca 
gagcgagact 
tgtaagaggg 
gtgtgggtag 
gagataattt 
cccaagcctc 
aaaagtgtgt 
tgcccatcta 
aggatatttg 
atggctttca 
gtttggaaac 
tcactgctag 
gcactttggg 
aacatggtga 
cctgtagtcc 
atgttgcagt 
ctccatctca 
ctgtaattta 
actaaaattc 
attctctgcc 
atatttgtac 
tggccccttt 
aggatacatc 
cgctctgtct 
ccgggttcaa 
caccatggcc 
gctggtctcg 
ttacaggtgt 
aaagtgtctt 
atgagctttt 
acaaagagga 
attgagattt 
tttattgata 
tttcacctaa 
ttgagggagt 
ccatcttgtg 
tgtttcctgc 
ttccatttct 
tttcattcag 
tctggctgca 
gaaagtttcc 
tcaaagctgg 
acagatgatt 
tgggagtgat 
taggaagttg 
acaaatatag 
aatttctaat 
ttgagttagg 
attcgttttt 



65460 
65520 
65580 
65640 
65700 
65760 
65820 
65880 
65940 
66000 
66060 
66120 
66180 
66240 
66300 
66360 
66420 
66480 
66540 
66600 
66660 
66720 
66780 
66840 
66900 
66960 
67020 
67080 
67140 
67200 
67260 
67320 
67380 
67440 
67500 
67560 
67620 
67680 
67740 
67800 
67860 
67920 
67980 
68040 
68100 
68160 
68220 
68280 
68340 
68400 
68460 
68520 
68580 
68640 
68700 
68760 
68820 
68880 
68940 
69000 
69060 
69120 
69180 
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ctacccaacc 
tgtattacat 
gtcgcccacg 
tcctaagtag 
tggagatggg 
tgcctgccat 
tttcttattt 
agaaatatat 
agacagagtc 
aacctctgct 
acaggtgtgt 
catgttggcc 
caaagtgctg 
gtggccacat 
tttaaataaa 
taatatcaac 
aaactggaaa 
tatagttgta 
catgaatttg 
ttacgggact 
gaactgaaaa 
tgctgtgggt 
gctttgtgct 
gtaggtgcta 
tacaaataga 
ttgggtttct 
agtttggtat 
gtgtagtgga 
acgagtggtg 
ggcagcagta 
cacttatagt 
caaaacttca 
ttcactgcta 
taactttggg 
tatgttcttc 
cctctcctca 
agcatttaat 
ggagaaataa 
aaccatattt 
gagtacagtg 
gtagagacag 
ccttccgcct 
aataaatgct 
gtgttttcat 
ttctctggta 
ttttatgccg 
ttgagctcag 
atttttaaaa 
gagatggtag 
ctgcacccag 
attttcgtat 
ttgaagttag 
tagtcacaag 
tctcatctct 
ggtgcagttt 
acttcccact 
agtagaagtc 
gtatgaaaac 
gtgagctccc 
taaatgctgt 
gcactcctca 
ataaatttta 
ttttctatcc 



tcttggcaat 
ttcttattta 
ctggagtgca 
ttgggattac 
gtttcaccat 
ggcctctcaa 
ttaaataagt 
gaagtcatac 
tcactttgtc 
tcctgggttc 
gccaccacgc 
aggctggcct 
ggattataga 
gttatacctt 
tacaaaagta 
agaatttggt 
acacctcacc 
agattgccat 
agttggccca 
tactctgggg 
agaaggtatg 
gcatctgggt 
ataaaacagc 
attaaatttt 
ctatattcct 
gaataatttt 
gcccctttgc 
caagcacatg 
ttgatccccc 
cctggcacca 
gtctggcatg 
caccaaacct 
agattacagt 
gaaattaatt 
attcataatc 
agtagactaa 
taatattaaa 
atgatgcaat 
attaaacact 
gcgcaattat 
ggtctcactg 
cggccttcta 
tttaagagca 
cctctctcat 
taaattctat 
ggtgtggtgg 
gagttcaaga 
atcagctagg 
gatcacctga 
cctgggtgac 
gtgcaaaaac 
ggttattccc 
atatttgttg 
gaagcttgct 
cagtttgttc 
gagatggaaa 
caggatttga 
attcataaag 
tccagctctc 
tgttggccaa 
cacactctga 
aatattccag 
tggacctcag 



ctcaaggctt 
tttatttact 
atggtgcagt 
agggatgcgc 
gttggtcaga 
agtaatggga 
gagatggtac 
aatgtaacag 
acccaggctt 
aagcaattct 
ctggctaatt 
caaacccctg 
cgtgagccac 
tttttaaaaa 
ggatttttaa 
aagctttact 
ttggtaaaag 
taaatatgca 
atgggccttc 
caatgccgtg 
gagcaggagg 
tctcaccttg 
tttatatatg 
tgttaaataa 
acaaaaactg 
aattttcttc 
cattgatctg 
caccctggac 
tagccctgtg 
agggattttg 
gtaaaaggca 
tagattcagt 
ggaagaccat 
atcttctcct 
ttggcagcct 
ataaaggcag 
taattgttta 
gtcgtgaagt 
tttttttatg 
ggctcactac 
tgttgcctag 
gtgtgctggg 
gtagttgcaa 
tagtgaggga 
aaaagtcttc 
gttgtgcctg 
ccagcctggg 
ctggtgatgc 
gcctaggagg 
agagtgtaac 
actgccaggc 
tatgactgct 
agcacattcg 
tgctgaaaac 
tgctgactga 
aaactcatga 
agaatggagc 
gctggttgtt 
ctctctctga 
tggtattacc 
catttgtaga 
tttcatcata 
gataatcaat 



caaactaact 
tatttatttt 
cttagctcac 
caccatgccc 
ctggtctcaa 
ttacaggcat 
cttttttctc 
aggtcaggag 
tagtgcagtg 
catgcctcag 
tttgtaattt 
acctcaggtg 
tgtgcctagc 
aaagaaaaat 
agattttctg 
aaaaagaaac 
agataagcta 
tcattcattc 
cggaaatgga 
tacctaaagg 
acacaggaga 
aattaacctt 
tctttattta 
aatttttaaa 
gagaaacata 
cacttttctg 
cttctcctct 
ccagacagtc 
cctccattcc 
tgaggaatga 
ataaatgtta 
cttctgtatg 
attgtgtagt 
agaactaaat 
ttctcatttt 
taactgtctt 
ataattgttg 
acttaccata 
agataggatc 
tgcctccacc 
gctggtctca 
attacaggtg 
tcctctgaat 
ctagatatgc 
attcattcaa 
tgatctgagc 
caatgtggtt 
atgcctgtag 
tggaggctac 
cttgtcccca 
taggaaaaca 
aacaggtggc 
acctctttcc 
catttgtcaa 
cacactgtac 
ggacaggcct 
aacacactat 
ttatttagga 
ggacactgtt 
atccatctgt 
tatttaagtt 
tttctgttac 
aataaacatg 



ttgaaggaac 
tgagacagag 
tgcaacctcc 
agctaatttt 
actcctgacc 
gagccactgc 
tctgtaacag 
tttgtctttt 
gtgcagtctt 
cctcctgagt 
tagtagagat 
atccgcctac 
caggagtatt 
ttatttcttt 
aaatgtatat 
aaaagcagct 
gaagagaggt 
tttcagttcc 
agtctcatca 
aggccaatgc 
gctggaggca 
tcctttgggg 
atatatgtgg 
acatgttttc 
gtaatagaaa 
ttattaagtt 
agggtctgag 
caggttccag 
cccatctaca 
aaatgggcat 
ttcatttttg 
cccccgtgtt 
aggtaagagc 
tctagtttta 
ataattattt 
tttttcttac 
gtgaaactat 
tttattaaac 
tggctctgtc 
ttctgggctg 
aactcctggg 
tgaacttccg 
ttaaaatggg 
attttgtggt 
tcaaaatgta 
cgccgaggtg 
aaaccccatc 
tcccaggtac 
agtgctctgt 
ccgtgcctcc 
aagtccttaa 
catatttttc 
cattgggtat 
tggtttattc 
tcccctttgc 
ttccctcgca 
tggtctttgg 
aataacaatg 
gcttacttcc 
tttcttttaa 
taccctccat 
agatttggaa 
gaagttataa 



tcttaattta 
ttttgctctt 
gcctcctgcc 
gtgtttttag 
tcaggtgatc 
gcccagccaa 
aaaaaaaaaa 
tttttttttg 
ggctcactgc 
agctgggact 
gtggtttctc 
ctctgccttc 
tttttttatt 
gttgtactat 
attgctttca 
gaatggagag 
gtaattttta 
ttggacacag 
gtttacttca 
catcagtgtg 
aagccgagcc 
gaactcagct 
ctggaacata 
tagcttattt 
aatacttctc 
tgcacctcta 
aggcactgct 
tcccagctcc 
taatggggtt 
atgcataaag 
taagtattac 
tatcagtttc 
atggaccacc 
gggaatacga 
attagccact 
tactgtgcat 
aacgtggtag 
acttttataa 
gcccaggctg 
ggctcagttt 
ctcaagtgat 
tgcccagcca 
aaaccttgct 
cgaccttttg 
tttataaaga 
ggtggatcat 
tctacaaaaa 
tcaggaggct 
gattgcatca 
cctcaagaag 
atagcttaat 
agtagaaagt 
gattgtatcg 
tttctattca 
ctcctgaatt 
cagtggtagc 
agaaactcaa 
acctgctcaa 
ctgtactttc 
gtgaataaat 
agaccctctt 
cttatttatt 
tctggcttct 



69240 
69300 
69360 
69420 
69480 
69540 
69600 
69660 
69720 
69780 
69840 
69900 
69960 
70020 
70080 
70140 
70200 
70260 
70320 
70380 
70440 
70500 
70560 
70620 
70680 
70740 
70800 
70860 
70920 
70980 
71040 
71100 
71160 
71220 
71280 
71340 
71400 
71460 
71520 
71580 
71640 
71700 
71760 
71820 
71880 
71940 
72000 
72060 
72120 
72180 
72240 
72300 
72360 
72420 
72480 
72540 
72600 
72660 
72720 
72780 
72840 
72900 
72960 
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tttacatgta 
gttgacctac 
tgacttttga 
tatttttggg 
ttttgaagaa 
atattaaccc 
atgtatgata 
actggcagcg 
atgattttgt 
cagataatat 
tatccactca 
tttttgtttc 
tacaaagtgt 
atcctacaca 
ttgtttttct 
attgcacctt 
aatgaaaaca 
aaattttcaa 
ttcaattgca 
agtcatagta 
gtttttcaaa 
gccatatgcc 
agactgaaag 
gatcttttat 
tctactcacg 
caacttgccc 
cttttccatt 
tgcttcctgt 
gtgtctagct 
tcccccactt 
gagatggagg 
gaggagggat 
ttcattttag 
tgctctgaag 
catgaatgct 
tgagatggtt 
atatatgaag 
gaagggaacc 
gtattatggt 
acatgagaga 
agtcagagga 
gaatcctaaa 
ttggccagtt 
agggtttata 
gccaggcacg 
cactttggga 
ccatgaaacc 
gtaatccagc 
cagcctgccc 
tggtggcggg 
cccggaaggc 
aagagcgaaa 
gaaactgctt 
tgtgcatcca 
tttgttcatc 
tgcattgaat 
tacattagaa 
tggatgagta 
ataggttcta 

tggaggctga 

tgaaaattta 
ggaaggctaa 
ttgcaccact 



gtataatgag 
aatattcccc 
tggcttatct 
cttcattctt 
aatcttgaac 
aatgtgcttt 
gggcagggga 
atcccttcta 
tgatgtcttc 
aaaaagttaa 
caaccactgc 
gactctcaaa 
tctgtaattg 
gtttaaattc 
tttttttaaa 
ggtagtcccc 
catagtgcct 
agcagggaaa 
gaaaaccctg 
tgttgtggaa 
gagatgaatt 
atttcatcca 
ccttcttgtt 
gtgaactgga 
tgtttccttt 
ttgttccctt 
tgcttttatt 
ttaaccaact 
cccccatttt 
tttccacggc 
attttgatga 
cagatctctt 
tgggaaggtt 
gcctccctgc 
ctttttcaag 
ctcaggcatt 
actgcatgga 
ccaaaggaga 
ctagatgttt 
gggggcagag 
caaaagtaga 
aagaatcagg 
tcactgagcc 
atgataaagc 
gtggctcacc 
ggctgaggtg 
cgtctctact 
agtttgggag 
aacacggtga 
tgcctgtaat 
agaggttgca 
ctccgtctca 
tgtgttaaga 
tctttcagtg 
tccagccttc 
taaccaaaaa 
tcacatattg 
ccagttatct 
aaatttatct 
ggccagaagt 
aaaaacaatt 
ggcaggagga 
gcactccagc 



gtatttcccc 
cttatgactt 
ctagaatcaa 
tttagtaaca 
ggaagatgct 
gtgtttacta 
gatggcctcc 
tgatcggttc 
ttttaaaata 
gaggaaaaaa 
gtcatttggt 
ctttcagcca 
attaagataa 
atagtacagc 
aagcaattca 
ttcagcctca 
ctatgagtcc 
tttaactttg 
tgatgttttt 
atgaagtaaa 
ggccaaacac 
ttttggtttc 
tcactttgat 
gaatgtaaac 
gcttttctta 
tcacctgacc 
gtgttgattc 
attcctctct 
ccacggctgt 
cgattccgac 
tgaggcattc 
cagtgacggg 
ggtgaggtta 
ttgcacaatt 
ttctccaata 
gtgtgtgagg 
agacaccttg 
aagtaaagct 
cctttttaaa 
tgaggccaag 
ttcaggagta 
aaagttaagc 
tatctacgga 
tcagaaaatc 
cctgtaatcc 
ggtggataac 
aaaaatacaa 
gccaaggcgg 
aaccccgtct 
cccagctact 
gtgagacaag 
aaaaaaaaaa 
ctaggtcacc 
atgcagaact 
atgctgaggt 
gtatgtccag 
ttgaatcctt 
aaaaattaaa 
caccaggtgc 
ttgagaccag 
agctgggtgt 
taatttgagc 
ctggggacag 
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cgtctcagtg 
ttcacccggt 
gacactttag 
ttaaacagtg 
tccaaagtga 
aataggcaga 
agtgcccaag 
cactggttca 
atgattagtt 
aaaggtgtaa 
atatttccct 
aagtatttaa 
atttagaagg 
ttatcagaca 
tcccacttta 
tttttttttt 
atattgagtc 
agcttatgat 
taaagaggct 
tcttttccac 
atcttaccct 
tgctccaaaa 
gttgagtggg 
gtgttatagc 
ttctaaggat 
caaatgattt 
cccagtggat 
ccctggctgt 
gtgaacgagc 
atcactgagc 
gtggatgacg 
catgacccgt 
ttgtgagaaa 
ttggataacc 
cttcaagctc 
tcctcagtct 
tcttaaattg 
aagagtcaaa 
ggattgtgat 
gtttggctga 
gagttaaatt 
ttctccttta 
tctttggcat 
atctattgac 
tagcactttg 
ctgaggtcag 
aaattaggct 
ccgggtcacc 
ctactaaaaa 
tgggaggctg 
attgcacaac 
aaaaaaatga 
aaactgaggc 
ctctatgcac 
aatttaaatt 
gtttaaaagt 
tataaataag 
ggatagcgag 
agtggctcac 

agtgggcaac 

tgtggtgcat 
ccaagagttt 
agcaagaccc 



tatttgagat 
agaggatagg 
atactgagtg 
ggagcaactc 
aagtttaatg 
ggctggattt 
acgaaagcga 
aacttgtggg 
ttaagtgttt 
aaaggcctta 
tcagttatat 
atcttccctg 
aaaagaaaac 
caatctcttc 
gtcattgcaa 
atacattatt 
aagtgttctt 
ttcagaattt 
gggtaaaaag 
tctgctctga 
agcaaagcac 
tgacgtgaga 
tcctttgagg 
acagtattta 
tagtccttac 
atgcttattt 
tctgtcctgt 
gttaattggc 
gccttgccga 
tggctgacga 
ccggctctga 
tttacgaccg 
ggcgaaaagg 
ttgcattagc 
ttagtcagtg 
actcagcaca 
cccagtaatt 
cgtagtgggt 
tgaaaaagca 
agaatttgtg 
tattttttgc 
cctaaaacgg 
ctgctaaaat 
cccatcagca 
gctcacacct 
gagtttgaga 
gggtgtggtg 
tgaggtcagg 
tacaaaaaat 
aggcaggaga 
tgcactccag 
atatttgttt 
attgtctctg 
atgatcctaa 
acaagttcaa 
attaaattta 
ggaattatgt 
tctcctgacc 
acctgtaatc 
atagtgagac 
gcctgtagtc 
gaggctgcag 
caagtctacc 



gtcacataca 
tggccattct 
ttggtttctg 
cttttttttt 
caatctcatt 
gatgcgagag 
aaccactgtg 
gaggtatgtg 
aaaacctgag 
tctataatct 
tcagcaggca 
actttaaact 
ttacgcctgt 
gtgatctgat 
gggtaaatta 
tccagcactt 
agaggttttc 
gaattgcttt 
gagttgctga 
tagcaaagta 
atgctgcttt 
atttcagtaa 
cctggatgct 
cctaggatgg 
ataacttttt 
tatccttttg 
tagcacctat 
gtcttacctg 
ccgcacaccc 
gcagcaagat 
cgcagggacg 
atccccttgg 
gaccagctct 
caattcaact 
ctggtctggc 
atacgtagca 
ttgttctgaa 
tcagtgaata. 
tatggaggta 
gtactgcagg 
ttagtctact 
atgcctgtct 
tttaacctat 
gtgaatattg 
gtaatcccag 
ccaacgtgac 
gctcacgtct 
agtttgagac 
tagccgggca 
atcacttgaa 
cctgggtgac 
ggtgctatat 
gctctcagtt 
tactggtgtc 
cagtagagca 
aatttaatgg 
cagtgttaaa 
tagtagagac 
tcagcaactc 
cctgtctctt 
ccagctcctt 
tgagccatga 
aaaaaatata 



73020 
73080 
73140 
73200 
73260 
73320 
73380 
73440 
73500 
73560 
73620 
73680 
73740 
73800 
73860 
73920 
73980 
74040 
74100 
74160 
74220 
74280 
74340 
74400 
74460 
74520 
74580 
74640 
74700 
74760 
74820 
74880 
74940 
75000 
75060 
75120 
75180 
75240 
75300 
75360 
75420 
75480 
75540 
75600 
75660 
75720 
75780 
75840 
75900 
75960 
76020 
76080 
76140 
76200 
76260 
76320 
76380 
76440 
76500 
76560 
76620 
76680 
76740 
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aaataaaata aaacagaatt tgtagtatgg tattggtcac atggtcggaa cctcataact 76800 
aaattagaag atgatcattc ttatggaaag ggggaaaaga aaagcttttt tgatagaaat 76860 
gaaagcctca agtgtaactg taggtcatag tattagacag tctctgggtt ttccagtggc 76920 
aaaggagcat ttagaattca gtttttccat gaataaacat aaagggatta tgtggccatg 76980 
aatattgttt tctactggca tatcgccatg aatagataac gtagaatttc agagaagctg 77040 
atttttatta aaattgagaa aggtatgtac aaataatggg acaataatgc ttgcttcatt 77100 
tgatggctgc ctgtgttctt cagcacgttt ccatcctgtt ctaagggagt tggaagttaa 77160 
aacccagttt ctggccataa actttgtccc agtgctaatt acacagtttc taaaactgtt 77220 
cgcttgtgtt ctataccatt gcagtgaaga gtaggcagct tcatcttacg taacatggtg 77280 
aggagaagcc tgacctaatt gtcaggagaa gccttgggtc ctaggccatc tctgtcacta 77340 
tgacaagcaa cttaaagtga agcctttggg ctctcaagtg taaagtgaat aacaggtaga 77400 
aataaagagc cataaagatc tttttcaact ttcatattct gtcatctaca ctagcaaaaa 77460 
tgaatatcct ggttaagact tctactaaga agcatttgta ttttgaggaa gttgtgtgga 77520 
aactcaagac aatcccactt aatgtggaaa ttaactgagg atacatgtat tgttttctct 77580 
ctagttttat agttttcttt cttttaatca tgtcagatgt aagtataggt ataaagcaat 77640 
caactctatt ttgattttca cataaaatac ttatggagct gtggctcagg aattgatttt 77700 
tccatttgtt taggttccca agttctaacc ttgcaagaaa ggactggggg aggaagtgaa 77760 
ataatataat agcaatcagt aaagtgaagg atagctttcc atttttgccc ttgactacat 77820 
ttctcttcta caaacaaaat ttacaataat gcaacatcac ccagttgtgt agtggtcatt 77880 
gtatgtgtaa ggaaagtaac tactttcctg tttttgtgat tcacaagatt ggatttctcc 77940 
tattggcata taattgccaa tataaatcac ttgtaaaatt tttagtttct ggctgggtgc 78000 
ggtggctcac tcccgtaatc ccagcacttt gggaggccga ggtgggcaga tcatgaggtc 78060 
aagagttcga aaccagcctg gccaacatgg tgaaacctcg tctctactaa gaatacaaat 78120 
attagccagg catggtggtt tgtgcctgta atcccagcta ctcaggaggc tgaggcagga 78180 
gaattgcttg aacccgggag ggggaggttg cagtgagccg agatcgcgcc actgcacttc 78240 
agcctgggca acagagcaaa gactctgtct agggggaaaa aaaaattagt ttcttgtgaa 78300 
attccattag taagtttgtt ttttagccat gactgcctct gctaaaactt tatattgcaa 78360 
gtgaaaaaga gcttgtccct ttagctattc tcttatggca tgctgggatg ttatgaacca 78420 
tccatcataa tttgggatga aatgtaaaat atttaagcta gtcttctggg ttaaaagggg 78480 
gttaaagatg atgattctgt gatctctcta agtaccttaa ctgacattct ttctataagt 78540 
agcaccataa ttttgctaat ttttcttatc tttatgttga ccagtttgtt ccactttttg 78600 
aagtaacttc attgaacttt ctgctaacgt atcctttcca tattcccctc tctcatccct 78 660 
gttctccttc agtctgctag gtcaaagtat aaggattctc aaaaatgcag agtaattatg 78720 
aaaattctgt ttaaaaagct tcagaataga atgtttatat agcattgatt tggagctaag 78780 
ggctatattc tggtaacata acaaggtgct atctagtgaa tgtgtgattg tgtagtcatg 78840 
gagctgatgt ccctgccata gaattacaga attatctaaa aagggaaatc tataataatg 78900 
gccttcaatg ctcccacact tggactagcc attgccatag gcagaaaggc aatctcatct 78960 
ttactgtgtg tgcaaacagt actttaatgt aatgtgtgct taatataagc tttctttaaa 79020 
aaaaaaaaag gtggctcctg tttttgaata gctattttta agatagatat agttaggaat 79080 
ctaaatgtgt tctatatagt taatatccat tatgaggtgg ctctgaaaaa tcaacctagt 79140 
ataagttgga tggctttgct tttctgcttc tgttayaacc catttttcta gaaaagcttc 79200 
cacttctact taaagataag aggcaaatca ttttcctgtt cccttatgca catagatgtt 79260 
cacttaccaa atatttatag agtggcccag ccctctgcag acagtggtga gcaaaaccag 79320 
acatggttcc tgccctcata gacttatagt ctgatggatg acacagacat aaattaaata 79380 
attgcacaaa taaatgtaaa ttatagcagt gatgaaatag ttgtgtgtgt tgtcggagat 79440 
cacataacgt tgggggacct gttatgagaa agtggtgttt gaaataagac ctgaagatgt 79500 
gaaatcaaag gcgtgaggct gggtgtggta gctcacgcct gtaatcccag cactttggga 79560 
ggctgaggtg ggtgaatcac ctgagatcag gagtttgaga ccagcctggc caacatggtg 79620 
aaaccccacc tcaaaaaaaa aaaaaaagac atcaaaggca tgaggtgtag gcatcgagtg 79680 
cagagaggac tccaggcagg ggacatgtgc gatggctctt taatggccca ggccttgaaa 79740 
tgggcaagtc tagctggggt gcccataata gaggagggtg atctgtgtat gtgttgactt 79800 
tgaaagtgaa attatccatt tgatacctta cgccaaattc tttgtctttt aaattctgtt 79860 
gcagtagttg ttgtttgagg gtcttgttta gaacttttta gaacttttca ttcttacgtt 79920 
tttttacacc agagcttcaa ctctctgttt cttagcattt ctctctctcc tgctttggag 79980 
cagactgaaa gggaagtggc attatttcat ctgatktaat gttcatttgc agttattttt 80040 
gttatatgtt tcatttgtat ttcatttatg ctctagttat gttaacattt gtttttctgt 80100 
gatgtattat gtctgttttc ataattttca ttactttgag ttgaaatccc aaatacccag 80160 
tctatcctcc tcctctcttt gttttaaaat agatttcaga gcaggaaaag catcagataa 80220 
acatgtatgt cacttggttt tgtttgattt attgatacaa gtgtctgtct ttgtcttttg 80280 
tcttatgttc gaagatcagc atataccaaa tgggaagact ggaatcatag taggaactag 80340 
attctatttc cctttcacta attggaatga acactctagc tttgtgacca ccagttttta 804 00 
ttgctttttt aaaaaagtag tagctacaca ataaagaaga aaatacaaat gtgataggac 804 60 
tttgctagtt tgcttgaaat ataaagtgtt agagctataa tttagctttc acttatgcat 80520 
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ttctgaatta 
cttctccctt 
cccccaggct 
caagcgattc 
gcctggctaa 
ctcgaactcc 
gacgtgagcc 
taatcttaga 
cctcaccttg 
tttttttttt 
cggctcactg 
tagctgggac 
acggagtttc 
ctcggcctcc 
ttttttgaga 
tcaccacaac 
taggacaaca 
gtttcgccat 
agcctcccaa 
tcttaaagta 
ggaggccgag 
gagaaaccca 
gtttgcagtg 
catctcaaaa 
ctcacgcctg 
atcaagacca 
agctgggcat 
tggcgtgaac 
ctgggtgaca 
atgctgtgat 
ttttttgtct 
atttcttaaa 
cactttggga 
acatggtgaa 
cctgtaatcc 
aggttgcagt 
gtctcaaaaa 
cccgagcagc 
ctacagtagt 
ccaagcgttt 
ctaggactga 
gttctcagtg 
ggtttctgtc 
ccacagggtg 
acaggccatc 
agacaaagca 
gtttgtatgt 
caattgttcc 
taatggggta 
aaagtcattg 
ctggagtgca 
tcccacctca 
tttaagttta 
cgaactcctg 
tgtgggccat 
ctctattgcc 
tgggttcaag 
atcacaccca 
ctggtctgaa 
tacaggcgtc 
ttaacaaagg 
ttttctcaga 
agctgatata 



ggttttgaac 
aacatttatc 
ggagtgcaat 
tcctgcctcg 
attttgtatt 
tgacctcagg 
accatgcccg 
actcaaaaat 
tcagtcaccc 
gagacggagt 
caagctccgc 
tacaggcgcc 
accgtgttag 
caaagtgctg 
cagagtctca 
ctccacctcc 
ggtgcatgcc 
gttggccagg 
agtgctggga 
acaataaatg 
gcgggcagat 
gcctactcag 
agctgagatt 
taaataaata 
taatcccagc 
tcctggctaa 
ggtggcgggc 
ctgggaggca 
gagtgagact 
atggaattta 
cctggtttct 
ttaaaaggtt 
ggccgaggcg 
accccatctc 
cagctactcg 
gagctgagat 
aaaaaaaaaa 
attcatagag 
taagttttag 
gcttgttcat 
gaaggggtct 
aatctgaaag 
tttagggcat 
gccatcgtca 
gcaggtaggt 
ggctgattct 
gtaatataca 
caagtatcac 
aagagatttc 
gtgtgttgtt 
atggtgcaat 
gccacctgaa 
taaattttat 
agttcaagtg 
tgcacccggc 
caggctggag 
cgattctcat 
ggtaattttt 
actcctgacc 
agccactgca 
attgttaggc 
gtacttgtaa 
agctgattgt 



tctaggagca 
ttttttcttt 
ggtgcgatct 
gcgcccccga 
tttagtagag 
tgacccgcct 
gcccttaaca 
ttcttgctgc 
tgcctgaaga 
ctcactctgt 
ctcccgggtt 
tgccactatg 
ccaggatggt 
ggattacaag 
ccctgtcacc 
cgggttcaag 
accatgcccg 
ctggtctcga 
ttacgggcat 
gccaggcacg 
cacctgaggt 
gaggctgaga 
gcgccattgc 
aataaggtaa 
actttggaag 
catggtgaaa 
gccgatagtc 
gagcttgcag 
ccatctcaaa 
ttttagagtt 
gttttatcta 
aagctcgcct 
ggtggatcac 
tattaaaaat 
ggaggctgag 
cgtgccactg 
aaaaaggttg 
cagtatcttg 
agcatgattt 
tatctgtgct 
taagacaagg 
ccagtttatc 
ttgtttacct 
gtgagaaagg 
gaccctcttc 
gcgtgggtca 
gacacatact 
aaatagatcc 
cttaaacatg 
tttttttttt 
catatctcac 
tagctggaac 
agagttgggg 
atcttcctgc 
cagtcattgg 
cgcagtggca 
gcctcagcct 
atatttttag 
tcagatgatc 
cccagcctgg 
ctttgtcttg 
tagacatttg 
ggtggctcgt 
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aacgcaaacc 

ttttctcttg 

cggctcactg 

gtagctggga 

atggggtttc 

gcctcggcct 

tttatcttga 

agagcttctc 

taatagtcat 

cgcccaggct 

cacgccattc 

cctggctaat 

cttgatctcc 

catgagccac 

caggctagag 

cgattctcct 

gctgattttt 

actcctggcc 

gagccaccac 

gtggctcacg 

caggagtttg 

caggagaatc 

actccagcct 

caataaagac 

gctgttgcgg 

ccctgtctct 

ccaggtactc 

tgagccgaga 

atcagacaaa 

attaactgta 

ttgggaaaaa 

gggcacggtg 

aaggtcagga 

acaaaaatta 

gcaggagaat 

tactacagcc 

agctcttgaa 

aaagtctgct 

ggtgtgtgtt 

gaaaaggcag 

ccagaagtca 

tctccatgga 

gagcaatctg 

tgaagtgcgg 

tgaaatgaga 

cgcttatatt 

ttggagaaag 

tgcttagtcc 

ttaaagaaca 

ttgagacagg 

tgcatccttc 

tacagatgca 

tctcattatg 

cctggcctcc 

ccttttattt 

caatctcaac 

ccctagtaaa 

tagaaacgaa 

tgcccacctc 

cctaaataat 

cttagttgtt 

tttggctgtt 

gcctgtagtc 



ttttaggcat 

agacgaagtc 

caaccttcac 

ttataggtgc 

accatgttgg 

cccaaaatgc 

tgcccattca 

tgcccatatg 

tttttttttt 

ggagtgcagt 

tcctgcctca 

tttttgtatt 

tgaccttgtg 

cgcgcctggc 

tgcagtggtg 

gctacagcct 

atatttttag 

tcaggtgatc 

tcccagccaa 

cctgtaatcc 

aaaccagcct 

gcttgaaccc 

gggcaacaag 

ccataggcca 

gcggatcacg 

agtaaaaata 

gggaggctga 

ctgcaccact 

aagcaaccaa 

tttcgtccca 

tgaaacaaaa 

gctcacacct 

gattgagacc 

gctggctatg 

cacttgaacc 

tggcgacaga 

tgggctcctc 

tccaggtgtg 

ggtcttcgtc 

ctggaggcaa 

gctcacagtt 

tgttctgaca 

ctgtatcccg 

ggatttctgc 

gctgtgagtt 

acacatacaa 

cctttattgc 

agactactgc 

gctcttgcat 

gtctcacttt 

acctccaggc 

caccaccatc 

ctgcccagca 

caaaatgtta 

tgttttgaga 

tcactgcaat 

tgggattaca 

gtttcaccat 

ggcctcccaa 

atatcttaac 

gtttttttgt 

gttttgtaat 

ccagctactt 



caactgtcat 
ttgctcttgt 
ctcctgggtt 
ctgccaccgt 
ccaggctggt 
tgggattaca 
gagtctcctc 
tgtctgctga 
tttttttttt 
ggcgcgatct 
gcctcccgag 
ttttgtagag 
atcggcccgc 
ctcttttttt 
caattttggc 
cttgagtagc 
tagagacagt 
cgcctacctc 
tatagttatt 
cagcactttg 
gaccaacagg 
gggagataga 
agtgaaactc 
ggcgcggtgg 
aggtcaggag 
caaaaaaatt 
ggcaggagag 
gcattccagc 
taaagaccat 
tttccctgcc 
cctctttata 
gtaatcccag 
agcctggcca 
gtggcacgtg 
cgggagccgg 
gcgagactct 
ctccaacctg 
actcttctcc 
aacagttctg 
atctgagagg 
gcaggaagat 
tttgtctcct 
tgcccctgat 
gtgtggctgt 
ctttgtctag 
actgtgggga 
tcctttctag 
tactttgcaa 
taactatata 
gtcgtccagg 
tcagccatcc 
ccagctaatt 
tagtctcgat 
ggattacggg 
cggagtgtcg 
ctctgcttcc 
ggtgcccacc 
tttggccagg 
agtgctggat 
tgcttcatta 
ttgtttgttt 
gcacgggtca 
gggaggctga 
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ggcaggattg 
ctcttaaaaa 
agcctatgtt 
aatgctgggc 
caccagtgtt 
tgttgggtat 
acatttgaac 
gttgcccagg 
ttcaagcaat 
cgcccggcta 
tctcgaactc 
gcgtgagcca 
aggaagttga 
ggtatcattc 
ctccccctgt 
gaagctcctg 
aatgaatact 
aagaatgttc 
tgccacagag 
atagactcaa 
attttccagt 
tacgttttta 
gtctgttgca 
gggtcagagt 
tatgtagaca 
tatcaaggga 
ctttgcatta 
actttgctgg 
aaactgggca 
gagtatgcag 
agctgctaac 
taacactctt 
gggagtgaga 
gctgcacatc 
tctgatcaca 
taataagcct 
aaaggaaaat 
atgtagggaa 
gccctctcaa 
ttttgtgaat 
agggccacag 
actggtgcca 
tggccgtccc 
aattgagtca 
tttgctacaa 
tgtgttttca 
ttttggaact 
tttggttaac 
ctaaaaagag 
ttgaaagcag 
aaggtatggc 
ctgcggtgcc 
ttctcagacg 
ccccagttga 
atggattttt 
ttttatttca 
tgaaaataac 
tcgagacctg 
cagtgaaatc 
ccagggtaag 
tatgtaacta 
tttgtctgta 
gagttaattg 



cttgagccca 
acaattgaaa 
ctgaggaaag 
tgtgtccagt 
gagtatatgg 
tattgactat 
aatctcaaga 
ctggagtgca 
tctcctgcct 
atttttgtat 
ctgacctcgt 
ccgcacccgg 
taaatctaca 
tctcatgcca 
gtagtctcac 
attatggctc 
ttaatcaggt 
agcattggag 
tagttcaata 
tattcatatt 
cactattatt 
tactacttta 
atgactcgtt 
tctgaggtca 
tagtttactg 
catagtggcc 
tttgaaccat 
atggtaagat 
gtgccttcac 
atatcttctg 
cgaggtgaca 
gagacaaaga 
accagaaacc 
ctgctgcagg 
taagcctgtt 
gaaagcaatg 
ttaaacaatg 
gatgtgatac 
accctgtgct 
cttcagtttt 
tacaagagac 
ccaagaggct 
atgggtccta 
aatgtgattt 
gtcatagttg 
ctcataccct 
gttgctatta 
tgatgtctga 
cagcatctct 
cctcaaaggc 
tgcagagcac 
agcaatacag 
ggtgaactaa 
aaatagtttc 
ctatctggca 
gactaataat 
taaatatagc 
tgggaaaacg 
attgctatta 
tcaggaaaga 
tgccatccac 
caaaatgaca 
aatgtacttc 



ggagtttgag 
aaaaaaaaag 
aatgtggtca 
gacctttcag 
aagtctcctc 
tatcttgcca 
ataaaactga 
gtggtgcaat 
cagcctcctg 
ttttagtaga 
gatctgccca 
cctgaaaccg 
tgaatagagt 
aagacttgag 
tcaattcttg 
tggaattcga 
gagaaaccgt 
atcagttctc 
gaatatggga 
ttattcattg 
tgttggacag 
ccatgtatct 
ctggtctgtc 
tcactcctcc 
tgcttgggga 
ttcatcaact 
ctgtgtcttc 
ggtaatggaa 
tttccgagta 
tcagttcaag 
tctcttgtgc 
ctgatcagtc 
tgtcctggaa 
gctctgctgt 
gatgagtctt 
acttatctag 
taaagaagga 
ctctcatttc 
ctggatgctt 
aatacctcca 
tgaagtgtga 
gctttgggtt 
gtgcagttta 
aaattaaatg 
ttggaatatt 
caaacttagg 
ttctaatcag 
agtgacaata 
acctggtgtt 
gtcctgtgtt 
tgtgaatgag 
gaatcaggga 
tttggaagac 
tttttctcca 
gtaaatcttt 
agggagagag 
ccattgtttt 
cttgctgtct 
aggaaaaaaa 
ctgcctggaa 
aggctctcgc 
ttgtttttat 
ttcactctct 



gccaacctgg 
attttggtat 
tttgctgtaa 
cctttcccct 
cacaagtaca 
gcatggagac 
ttcttttttt 
ctcggctcac 
agtagctggg 
gacggggttt 
cctcagcccc 
attctttatt 
gcgaagcagc 
tttataatcc 
ctaatttttt 
cagtcaggaa 
caggaagaag 
ctgtaaaact 
catgtctaac 
cctttcctca 
ataaacagaa 
gcccctgcct 
cttggaggag 
agaagaaatc 
cattttcgac 
aggaatggaa 
atttgaccct 
gggttttctg 
acagtgttgc 
taagctgccc 
actgtagagt 
tctgaaggaa 
tgcaacaggg 
gagcggtccc 
tctagtcctc 
ctagcaatta 
taaattctac 
tcatctacat 
tcttctcaag 
aacttcaatt 
cccctccata 
ctgccatccc 
taaagctttt 
tgatcactat 
acccattgag 
gcaggatagt 
gcaactgata 
atagtgagtt 
agctctgggg 
tttatagtcg 
ttcacctgtc 
tttggagcac 
aacaagttca 
ttttttcttg 
gtgttgggtc 
agagagagaa 
aacaaatctt 
gggggagaga 
cattactagt 
gaggtgacat 
actaactgtg 
ttttagtaaa 
ctaaagcatt 



gcaacatagt 
tttccatttt 
tctttctgtc 
atggaaactt 
atataatatt 
agtcaggatt 
ttgagacgga 
tgcaacctcc 
attacaggtg 
caccatgttg 
tcaaagtgct 
gactagtctc 
ccatgccgtg 
agcaagagct 
tttctgcttt 
cagctaaaat 
gaaaaccctg 
gaaaaataca 
aattatgaag 
gacagccctc 
gtaaggcaac 
ttttttcaga 
ttgaggattg 
agtcgaatta 
aaagggaaaa 
agcatgccca 
ctttagattt 
aagagattgg 
aggccagtgg 
ctttgctctg 
gctttacatg 
aatactttca 
ttccctggca 
cctttaccag 
ttcatgttga 
cgcctattta 
atcaactgaa 
aggacttgct 
actgccccag 
tgtcttttca 
cctaccgccc 
tctcccctca 
gtataggttt 
atagtaagcc 
cattgatttc 
gttcaccaga 
cgttttatgg 
gacctttgtc 
aaaagggcat 
ctcctctggc 
gtcagggaac 
ttggcaaaag 
taytaccagt 
atggaagggg 
ctctttaact 
cgataacatc 
tccagagtat 
ggcatataaa 
aatgtgggaa 
ttgaatccac 
gaagcttgag 
aagcttttac 
cttgcatata 



gagaccccat 
tccatgatcc 
ttaattccca 
gtgacacaat 
ttcaaatact 
tctaggtatg 
gtctcgctct 
gcctcccggg 
cccgccacca 
gtcaggctgg 
gggatgacag 
atttgagctg 
ttactacttt 
ttttccctcc 
agcggatgaa 
atcttttgat 
tggaaagaga 
gggtgctggt 
gtaaaggtgt 
agtccatata 
tggggaaaaa 
gtgacttttc 
tggaaggaca 
atgacttggg 
tattgaccat 
actccctcct 
gaagtcaagc 
caaccacctg 
aatcctccca 
cctcccagct 
tgtgagggat 
tcactcctat 
ggaggagcat 
gaatgtaggg 
aatgtcttat 
atactcggga 
aaaagctagt 
ctttcccgcg 
gaaaacattt 
gtacttggaa 
tcagtttgcc 
gaatgtgcct 
tatcacagaa 
actaatatca 
acctatttaa 
tgaatctgtt 
aaaacatcta 
ctacaatctc 
gttgcttctg 
ctgtgccggg 
cccaggaatg 
catgctagtg 
ctgtctgtcc 
tgtccttctg 
gtgcagttct 
tcactrtctg 
cttctgtgtg 
tacattatta 
cacaaagctg 
aacagactct 
tctgctaaat 
aaaggaagag 
cctccatata 
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gctaagtgat 
tctgtgcaat 
agtggaatgc 
cgtttaatgc 
gatgaagcat 
tatcatgtgc 
cttggctttc 
gggagagtag 
agcaagcctt 
aatcttccat 
catgcttaat 
tgttgctcga 
ttagatctgt 
tattcctacg 
ctctattcca 
tgttagtcac 
aagataatcc 
ggctgggcgc 
cttgaggtca 
agaattgctt 
cagcctgggc 
taaagttact 
ctctggtatt 
gtctgctctc 
aatatgccat 
ttgatcagct 
gatgctccca 
tcatgcaaag 
agttatttct 
tggagaaagg 
aacttggcat 
ggccttccag 
tgaaaggagc 
gccaagatca 
ctttagaatt 
taaaataccc 
tcatgcctgt 
tcaagaccag 
gccaggcatg 
tgcttgagcc 
ttgcctcaaa 
atgtgtattt 
actgaatcat 
tatcagcagc 
caaacatttt 
tcttttgagg 
aacttcacta 
tgaccaggtg 
ttgaggtaat 
tatgaaacag 
ttctgatgat 
ttagagggag 
ttatctggta 
acatgctatt 
gggtttattg 
aattcacaca 
gttcatctta 
gaatccagaa 
agtttctctc 
tgtcacagaa 
attcaatttg 
ttgatctttg 
acccatgcca 



cagggtcagt 
atggaaaagg 
ttctttctgt 
attgttcacg 
tctccacgga 
agaatgtaag 
tcagcccggg 
acaatagagg 
cttgtctgtg 
cagtttgaat 
aagagtttta 
ttgcaaataa 
tttccttgca 
tggagtaaga 
aagtcattat 
tgttctgaga 
ctgtgttagc 
agtggctcat 
ggagtttgtg 
gaacccagga 
aagaaagcga 
catcagtgtt 
caggatattc 
taaagtaaga 
cttctttttc 
gctgtgcaca 
gtgagcttcg 
atgaacatga 
attaatggaa 
ggataggaag 
tgtgcctggc 
gcagggttgc 
agcaggtgag 
gttgaggcaa 
aaaaacaaca 
agcagatact 
aatccagcac 
tctgggaaac 
gtggtgcatg 
cgggagggtg 
aaaaaaaaaa 
tcttgtaatc 
ttgtggatta 
acccacttca 
tgatggtatt 
gattaagata 
tcatcttata 
acctcaaata 
aggaatataa 
aagctctaag 
ggcagagtgt 
acatttttta 
ttatgccaga 
gataccaaca 
atttctacac 
aagagaatta 
atgtctgtgt 
cttcaaatct 
tttttccctg 
cttttacgta 
cagtccagca 
tcttcctttc 
ctgtccaagc 



taatttttca 
ttcttgacaa 
aggatccaca 
agtcttttta 
gcccctcaaa 
tgacatggac 
aagggtgata 
gcgtggctag 
ttacagtggg 
gcaatgagtt 
ttaaagtgaa 
cataaaaaat 
actgatagta 
aaacagaccc 
cctcatcaca 
gtttttgtgt 
tgagatgact 
acctgtcccg 
accagcctgg 
tgggggggtt 
gactcccatc 
acaagaggca 
actactacat 
gaggaaagga 
atgttggtat 
aggattgtgc 
aattacgata 
gaaaatattc 
agctatttta 
ctccctgaag 
acataggcca 
agcctgggcc 
tgagtgagcc 
agtatggaag 
taggggatcc 
ggacactcag 
tttgggagct 
atagtgagat 
tctgtagtcc 
aggccgcatg 
aaaacccaaa 
ttttcctaat 
catcaaaacc 
tctgcaagga 
gttttgattt 
actaaataag 
gccacctaat 
ccatcttttt 
catgaaaatt 
tacagtcatc 
tggtgttggc 
aatgagattt 
gttgaaaatg 
tccaactctt 
tgcaaatagt 
gaaagcgatg 
gccagttctg 
gcaaatagaa 
ctttcagaag 
cattctcact 
caaaggggag 
tttgcagtcc 
caggtgagca 



gtcatttttg 
ggaaagtaaa 
ttgggctctt 
ttttttttat 
aacaatggca 
ctttttgcca 
agaagaggta 
agtggtgagg 
cctagtgaat 
gttttagtgt 
tgcattgtcc 
atcaaatacc 
taattaacac 
ttccctctct 
atagctaata 
aatttaatct 
gaagcataga 
gcactctggg 
ccaacatggt 
gcagtgagtt 
tcaaaaaaaa 
gagataggac 
gaaactgtct 
gaacttcaaa 
aaatataagc 
ctgacaactg 
ggggaaataa 
actaaatgca 
agttactttt 
gatgtagtat 
tagtcagtac 
agtccttggc 
agaatagtaa 
atcttgaata 
cctgtttttt 
cagcagtaaa 
gaggcagaag 
actgtctcta 
tagctactcc 
agccttgttt 
atacgtacta 
cttgcttggc 
aagcctattg 
caggagctta 
ttagtttctc 
tattctatat 
cactgatttt 
gtaggaagag 
gttaaaaatt 
acaactgctg 
agaagaattt 
agcctgtttc 
gaaaagtctg 
tgtagtcttt 
cattacaaaa 
gtaatgacca 
tttacgttaa 
tgtcttgaga 
ctatacataa 
cctacttgtg 
aatttctaaa 
gcctcagccg 
ctcgctccgc 



aaataaaatc 
taacttaaag 
gaaagagatt 
gaagcttttt 
gaggaagtcc 
aacatatgtt 
caagtcactg 
tcatccacag 
gtgggttcgt 
ttctgtccta 
tttaaaccct 
ttctgtagct 
attaatgccc 
tacgtgaaat 
ttcactgaga 
ttacctttat 
gcaattaaag 
aggccgaggc 
gaaatcccat 
gagattgtgc 
taaaaaagta 
ttaagctcag 
ctcattctct 
gatggaacaa 
aagcatgtat 
gggaatgcag 
aatgcataca 
gtgggagtgg 
tgttggggag 
aagcaagaaa 
atgcatggat 
acagctgagg 
gtgtgggaag 
cctgaagctg 
gttgcaagag 
attgggccag 
gatcagttca 
caaaaaattg 
ggaggctgag 
gcgtgctaca 
aagttctgat 
tttagattgc 
tatttgaagt 
acaggtttgg 
aattgctgga 
taccctgttt 
agatagagca 
actttgaata 
gttcctaatg 
aggaaagcaa 
ttttatttat 
actatctagg 
cttccttaaa 
ttctttctct 
agtttttttt 
gcaagaggaa 
tgttggaaaa 
taggcacgtg 
aatgttgttg 
aaacccctaa 
ataaataact 
tgccgccgat 
tttttgcatg 



taaaatgccc 
aatctgctta 
ctgataaaac 
gcatcgccat 
cctggccttt 
tttctggtac 
gctgtgcagg 
cggcctgagc 
ctgccaataa 
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tgtgaataca tgtcttattc tgaatgactt ttcaaataga agtcatgttt gaattcctat 91920 
tcttctatat atccacatgt attcctgctc ctcttgtatg tatctagtag gaactagaat 91980 
tcctagttcc tgatttaaaa taagaagtaa aggctgggcg cggtggctca tgcctgtaat 92040 
cccagcactt tgggaggccg aggcaggtgg atcacctgag gtcaggagtt caaaaacagc 92100 
cagaccaaca tggtgaaacc ctcgtctcta ctaaaaatag aaaattagcc gggcatggtg 92160 
gcacacgcct gtaatcccag ctacttggga ggctgaggta ggagaatcgc ttgaaccggg 92220 
gagatggagg ttgcagtgag ctgagattgc acctttgcac tccagcctgg gcaacaagag 92280 
cgaaactccg tctcaaaaca atataaaata aaataaaata aagtaataaa ataagacaag 92340 
aagtaagaaa caataggggg ccttatctac aaactctctg gtcttctatt tgcagatttt 92400 
gataatcact ctgttcctaa atctttctgc tttgcatgta acttcccata actgtcaggg 92460 
gactaggcat ttctttgcat agtaaattta ttctgtatgt tcagtgatct tgttttgatt 92520 
tttctttatt tctgtccttt ttcctcattg cctattcgcc aaattcccac cccaccctgt 92580 
gctcactctc caccctctgt attccccatc ttcatctgcc ttcagaccat gagagaaaga 92640 
ttgagctgat tcggtttctc gtgtctggct atgtgaatgt gcatatgttg gacacatttt 92700 
ctgagcacgc cagtgtgctt gcatcctccg ctattgccac cttccaggat gaggctgacc 92760 
cattgccgcc ttttctcttt ctgccagttc ccagctgcca gagcgagagg gtacccaaac 92820 
gagatggtta gtagctgaat tagctccacc ttagatgtag aaccatcctc aggcgcttgt 92880 
cctgaagcct ggctctctgc atatgaaaca ctgacatttc cctttgcaac ttgagctgaa 92940 
tgtagagctc agcacatttt caaggttgta tcacattttc ttcctagaga tttgggaagc 93000 
agcttctcta catgtgcctc tttcacatca ccatcgcagt gggggaggcc tttgcatttg 93060 
tttttatgga ttctttgcaa tagtattttt aattttgaag attttactag tagttggttt 93120 
gttagaatac ttctggtttt gctttctgac ccttagatga ctgccaggat cacttaccct 93180 
attttctttc tgactagcat caccaactca tgtgaaactc agctgggctg aatactgctg 93240 
atagggtaca cgtgtgaata gtgctaaatg attcataaag ctcctttatg tacattatgt 93300 
tagtggatct tcatgattgc ttttcaagat agtaggctaa atattagggt ctacagttta 93360 
taaatgagga gacagactcc cagaggttaa gcagcttaac taaagtcaca tagcatcagt 93420 
acagttaaga atatggagtt taaatctggc cgggtatggt ggctcgcacc tgtaatccca 93480 
gcatttggga ggctgagtcg gacagattgc ttgagctcag gagttagaga ccagcctggg 93540 
caacatggcg aaaccctgtc tctacaaaaa aatgcaaaaa ttagccgggc gcggtggtgc 93600 
acgcctttgg tctcagctac tcaggaagct gaggtgggag gatggtttga gcccagaggg 93660 
cagaggttgc agtgagccga gatcacgcca taaaaaagaa aaaaaagcat atggaattta 93720 
aatctgaaga aaatgtgtga cataagatca tttacagaat cctgtattta taaatgatac 93780 
cagtagttgg tttccaactt tgtcctcaca ttggtatcac ctgggaaact ttacccctca 93840 
tggaggaaga actttgtctt tcccttagag attgtgattt aattgatctg ggatgtgtcc 93900 
tggactttaa ggtccccagg tgctcctatg tgcagggagg tttgggaacc actggggtgg 93960 
aggatgggtc ttttcttcag gacatcagct ttgattgagc ccctgtgacg tgaggctttc 94020 
caagcattgt ctctcccacg ttagtcccat gctttacgtg tcacacaatc tgagattctg 94080 
gaattggaag gtttcatgag cacatttatt. ctacacaaaa gctgttttct catcattaga 94140 
gacagtttgt tgagaaaaaa gatactgatt tctttatttt ccttgttctc tgaatgtaga 94200 
gataactaat aacttacaaa gtttatattg gtgtttctct gggtcttttg cattgaaatc 94260 
tttcctctga ctcttgctgc ctttcccaag tatgtcttta aagactgaca ttagagaagc 94320 
ttgtattttt gctgagtagt cttggaccag attttgacat actctaaaaa cttttaaaag 94380 
ttccagccac caagttaaac acgatgagca aaaccagcct tggccagagc atgagcaagt 94440 
atgacctcct ggtttggttt gagatcagtg aactggagcc tacaggagag taagtccaac 94 500 
ttaataaatt tttaaataag gcaaaatgtt tcaaattagg aaatgcaatt ttgaatcttc 94560 
ttgtatttgg gttcatgaat cattgtcact gtcaaattcc agggatggtt ttgtttttgt 94 620 
ttttgtttta atagagatgc agtcttgtta tgttgaccag gttggtcttg accactcctg 94 680 
gcctcaagca gtcctcctgc ttcgacctcc caaagtgctg agattacagg catgagccac 94740 
cacacctggc cgaattccag ttttgaattg atttttggtg gtcttggtat ttgaatcaga 94800 
tgtttatcag agaactgact gttgagcata ataatttgcc ttctcgttag caacagcaaa 94860 
atcatggaga caggcatctt gttgtttttt ggggtttttt ttgttttttg ttttttttga 94 920 
gatggaggct tgctctgttg cccaggctgg agtgcaatgg cgcgatctca gctcactgca 94 980 
acctctgcct cccaggttca agcgattctc ctgcctcagc ctcctgtgta gctgggatta 95040 
caggtgcgtg ccaccacgcc aggctaattt ttgtattttt agtagaggca gggtttcacc 95100 
atgttggcca ggctggtctc gaactcccga cctcaggtga tccacccgcc tcagcctccc 95160 
aaagtgctgg gattataggc gtgagccact gtgcccagca gcaccttgtt gtaagggttt 95220 
tatcataatg ggttttatta gtagtaaagc aaaaggagca tattgcatag aatatggtaa 95280 
agacttacat tactgatgtc aaggaggaca tagaagggca gaatattttc ctagaccagg 95340 
gtttctcaac ctcagcacta tttggggctg gataattctc tgttgtgagg ggctgtcctg 95400 
tgcatttaga atattttgca gcatccttgg cccctaccca caagatgcca gtagcagctc 95460 
cactctattt gtgacaacca aaaatgtctt cagacattgc caaatatccc ccagggaggg 95520 
catcctcagt tgagaaccac tgtcctagat caaagtgata ccatcagctt actctgtctc 95580 
gatgagctga gttagagtaa gtgattaatg taattgttgt gtataatcat ctctaaaatg 95640 
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cttttgtaaa aacacaaatg aaagagatct tactcttagg acagaggcaa ggggctgtat 95700 
cagttaatat aaagggtagg agtaaatgtt atcctaatgt tctcctatat gtaattctga 95760 
gaaccaccat catggtcatc attgggaaac acttattaag ttcctactat gtgtaggata 95820 
ctgtgttaag tgtggtatgt agttgttagt agtgacagtg ctttgaggaa tttgactttt 95880 
catagacaga cttatgatgt gattacaagt tgcattatga gggagaggta gttacagtaa 95940 
ttgtttctac tgaattaatt gctttctgaa tcttaaaaaa aactatagtc tgaaattatc 96000 
tgtgtttctt atctcctgtc aattgccatt gcacttctac attatttatt gaatacttaa 96060 
tatcaagggg ctggatgcaa ccaaaagatg ttactgtcct ttgactactc aagtctagag 96120 
aggattgaga catagatgtg tggaaaggat gagtaaatgc aaatgactgt atgattaaat 96180 
gccagcatgg gtgagaaaga catgggatca gaatttatca aaacaaagac caccaggggc 96240 
taagccttgc agatgaagta gcacttagtt ttttcttttg cgtttctatt ttttaggtat 96300 
atcccagctg tggttgacca cacagcaggc ttgccttgcc aggggacatt tttgcttcat 96360 
caggtactaa tgagggacca aaacaggcat cggagggaac actgaaaaga atttagaata 96420 
agagacggaa ttctttctta ttctctgttt tcatctctat ttcagagggt gggcggggca 964 80 
ttgtcctgtt gtcatttttt tttttttctg tgtctgtctg tagatatttc ttcctcagac 96540 
cttggcttat atttatcctt gcttttatat gtatttagaa tagctatttt gaaaaatatt 96600 
ttgtttcatt caattactct aatttgtgta gctcctatgt cacttttaat acttttcatt 96660 
tcgtgatgtt tggaagctct gtgctttcac accagaagtt cttaatagga atagaggatg 96720 
aaggtttcaa tggactaaaa atctggttat gaaacatagg tccgcttttg gtaaatcaaa 96780 
agtaggttca gtcagtgttc aaaagtactt aagacccaag tgggcatagg ggctagggat 96840 
accatacatt tgggacatac aatcagggca tggttgaata cttattatac aaggggctgg 96900 
gatgcagcca cataatctcc tgtgacagat tatggttatt cttaaaacca tagagaaaaa 96960 
gaatttttta aaaaagaaaa gaaaaccata tcagaacctg actgttcctt agtccctgac 97020 
acatctccct ttgttttgaa taacaatgtg gatagcattt tccgtctttc cacacctctc 97080 
cttcctgctc atcccccatg ccatgccata caggccagaa aacagtagac gagaggagat 97140 
aatagattgc ttcagctaaa ttgcaaccct gcttcattac ctagggcatc cagcgaagga 97200 
tcacagtgac cattatccat gagaagggga gcgagctcca ttggaaagat gttcgtgaac 97260 
tggtggtagg tgagtacgtt tcatcagcca aggatagaac caggacttac agagattttt 97320 
tttttttttg agatagggtc ttgctctgtc acccaggctg gagtacagtg atgagatcat 97380 
agctccctgt agcttcaaat ctcctgggct caagcaatcc tcctgcctca ggttcccgag 97440 
tagctgggac cacaagtgtg aggcactgtg ccctgttaga gatgttattc ttccctgttt 97500 
gcggtacatg gaatgtgtag aagaccatat tcggaaacac agactatgga gatgcagagg 97560 
ccttgagttt gaaactctgt attcctgttt cttcgctctg tcattagtga tatgttatcc 97620 
aactaactaa ctcagttacc agggaatggt tctgtggtag agtctgtgag agggcataaa 97680 
agaaatctct gctggctggg tgtggtggct catgcctgta atcccagcac ttttggaggc 97740 
tgagatgggt ggattacttg agcccagggg tttgtgcctg gccaaggtag caaaatccca 97800 
tctctacaaa aaaatatgaa aattagctgg gcatggtggc acgcacctgt agtcttggct 978 60 
actcgggagg ctgaggtggg aggatcgctt gaacttggga ggtggaggtt gcagtgagct 97 920 
gagatggcac cactgcattc aagcctggac aacagagtgg gactccgtct caaaaaaaaa 97980 
aagaaaagaa aaatctctgc taattgtgag cttgcatttt agttagggag acagcatata 9804 0 
tatgtgcaga atgctttgat gaataaatac caaaatgagg acaaaggaaa atctgtgcta 98100 
agagagttaa cagtgaaaac aagccaggag gactgaatag attcaagaaa aattagaggg 98160 
tttttttttt ttttccagac acaagcattt tgagtaagag agtcaggaat atgtctaaca 98220 
tgcaaatgac aagaaacccc atggccagag tggagaaagt aaattaagga gtattggcta 98280 
gatagtgtaa actttagaat gacagtccga ggcaagagtt tggaactttt gttgccatgg 98340 
acctctttgg cagtaacgaa gccttagaat tatgttttca aatgtgagaa ataaagtatt 98400 
taggattaca aaggaaaccg ataatattga aatacagttg ttaaaacatt aagctcttct 984 60 
ttactgatgc atgaaataac aagatttagc agttattcta attaccataa tttccaagta 98520 
gtgatcaatg aaaaagatat cttgtgtcca ggcacggtgg ctcacgcctg taatcccagc 98580 
actttaggag gctgagatgg gcggatcaca aggtcaagag attgagacca tcctggccaa 98640 
catggtgaaa ccccgtctct actaaaaata caaaaattag ctggttgtgg tggcgggcgc 98700 
ctgtagtccc aggtactcgg aaggctgaga caggagaatc atttgaacct gggagacaga 98760 
ggttgcagtg agccaagaca gcagagtgag actctgttta aaaataaata aataggccgg 98820 
gcgtggtggc tcatgcctgt aatcccagca ctttgggagg ccaagatggg cggatcacga 98880 
ggtcaggaga tcgagaccat cctggctaac acggtgaaac cccgtctcta ctaaaaatac 98940 
aaaaaaatta gccaggcgtg gtggtgggcg cctgtagtcc caactgctcg ggaggctgag 99000 
gcaggagaat ggcatgaacc tggaggcgga gcttgcagtg agccacgatt gtgccactgc 99060 
actccagcct gggcaacaga gcaatactcc atctcaaaaa aatatataaa taaataaata 99120 
agatattttg cacatggaaa catctgtggt ttctactggg gacagagtca cgggtactgg 99180 
ccaattcttc tgtggtttat tgccttcctt tgtgattgaa ggagatgcca atttttagtt 99240 
aaaggtttgt gaaaataaag atgtaatttt tttccccatc caagctcaag gaccctctga 99300 
attctgtcca cagagtgctt gtgtctctct ggaccagatt aagaaccttg gggccaggca 99360 
cagtggctca cgcctgtaat cccagcactt tgggaggcca aggcaggcgg atcacttaag 99420 
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ttcaggagtt tgagactatc ctagacaaca tggcaaaacc ctgtctctct acaaataagc 99*480 
tgggtgtggt ggctcatgcc tatagtctca actactcagg aggctgaggc tggaagcttg 9954 0 
cttgaacctg agaagtcgag gctgcagtga accatgatca tgccactgca tgacagaaca 99600 
agaccctgtc aaaaaaaaaa aaaaaaagaa tcgttggaag ggagattatg ttctctcttg 99660 
tctggggcag cagtggtagt agcttttcaa tatttttcat aataaatttt tcttttttta 99720 
atttagtttt ttttttaact ttatttacaa agtactgaat gtcctttgac aaaataatgt 99780 
ttatatggaa ccctaatacc gaaacagaag agatgctttc tgaccagaga aggagaggga 99840 
ggatgcagag gcagcagtta ctatggcttc tctgaaggat cctatgatat atttttagtt 99900 
tgttttatta acaaactctg atgtagtatt taccacatgc caggtgcttt tctgaatgct 99960 
ccaaaaatac tagtttgttt aatcctcata gcaaccatat atggaaggaa ttgttattct 100020 
cgcttgacat atgagggaac tgaggctcag agaggttgag ccttaggctc aaagcctcac 100080 
aggtggtagg agttggaacc caggcagtcc agcactggag gccaagtcct taacgtctat 100140 
gcaatgctgc atccgctagt gttaacttga aaacagagtg taggattttg agcaagaaag 100200 
gggtgtaata aagtcagcat ggccaggtat ggtggctcat gcctgtaatc ccagcacttt 100260 
gggaggccaa ggcaggagga tcgtttgagc tcaggagttc gagaccagcc tgagcgacat 100320 
agtggggccc cgtctctaca aaaaaataaa ataaaataaa atagctggat gtggtgctgc 100380 
aggcctaaag tcccagttac tcaggtggct taggtgggag gatcacttca gcccaggaat 1004 40 
tcaaggctgc agtgagctat gatcaaacca ctgctctcca gcctgggaga cagaccaaga 100500 
cattgtttca aaaaaaaaaa cacaaaaaaa caagaaactg tcagtcagtg gcttcagaag 100560 
ttgctgttag gtgagagtgg aatggattgg aacggggagg tgtgaatagc aggtcagaag 100620 
ttgctgttag gtgagagtgg aatggattgg aacaggggag gtgtgaatag caggtcagaa 100680 
gttgctgtta ggtgagagtg gaatggattg gaacggggga ggtgtgaata gcaggtcaga 100740 
agttgctgtt aggtgagagt ggaatggatt ggaacgggga ggtgtgaata gcaggtcaga 100800 
agttgctgtt aggtgagagt ggaatggatt ggaacggggg aggtgtgaat agcagggaga 100860 
ccagttaaat ggctgttgca ggttccagac cagagaggag gtggcctttt gtgatcaggg 100920 
caatgatatt gtgaataaaa gttgggatga atgtgggaaa ctctttgaat aaagaatcaa 100980 
tatggtttca tagttaactc aatgtgggaa gtaatggagt cagaccgtgg aggaccctac 101040 
aggtaaccaa atgacccttc cagctcctga acttacacag agatttaaaa gtctcttttg 101100 
ggcttaatga attcatgttt tgttttcaaa tgtgtccgtg ctctgttttt tttatccttt 101160 
cttttaggtc gtattcggaa taagcctgag gtggatgaag ctgcagttga tgccatcctc 101220 
tccctaaata ttatttctgc caagtacctg aagtcttccc acaactctag caggtgggac 101280 
acccagagca gtgtgaagaa gtccacactt gcaggcgtta attggtacac cgttaggtgc 101340 
cttattcatt aatgactccc agttcggaca aagaaattaa ctcccttctc ccttctagct 101400 
tcaaaaatct ctttatttct tcacctgcct gctgtactct ccaaaaagaa agaaagaaag 101460 
tattgcagat atttgtatgt gatcagttac tcttagagaa tggaagtgat cctgtcccat 101520 
gtgaagtttg aatagatgta acaagtgata aatgaaaatt ggagaaagaa aacagtatat 101580 
tccccaagga tttaaaagta craattaatt attgccaaga ttaaattttt tcctgtgaaa 101640 
tggttgctgt ggagagaaag gtttctctca ctccccaagt atcatggaaa tgtgccctct 101700 
gagataaaat gaagcctctt gtgttaagtc tttgctgctg ctgactatag gctccattat 101760 
ccttagtatt tgtttttcat ttatgcacaa aagcgataaa taatatgaac tcttatgacc 101820 
aagattggct gccataaata atagatttta ctttgttttt tattttaaat gtttcactta 101880 
aaattatctt ttataatcag gtgataacag acagttcgta aagtacatag ccattctgct 101940 
ttcctttgta agactagaac taaaccagct gggatcctct tacctcaccc aggaagggag 102000 
ttctcattgt ctcagaggct ccgtctctat atcctagggg ctgggaaaca ggagcggctg 102060 
cctcttctaa tgtgggctga gcacactgcc ttcccaccac aggactttgc tgtgcttcag 102120 
tctgctcctg aaaaacttgt ttacatccat gttgagagga gggagggaag gaataagacg 102180 
ataggcagaa atggaatgaa aggtttggac caggattaat ttctttctgt tgttttcttg 102240 
tattatagtt taccatagga aagagaaaaa agaaacaaan ttctgtttgc tggcaggggc 102300 
tgtggagaaa gagggaagga gtgagagggt acatatgtat gtctggggag ggggagtggc 102360 
agaaataaag aggggtgctt tgactatcca tgggcaaaat cacgagcaat actatttccc 102420 
tgaagtcagc agtaattctg catagctttc tgttttctgc caatgttcag ccgggtttta 102480 
ccaccacatt ttctttgtag ctatgaaaaa tgtattgtaa atttccctag ggtacttagt 102540 
aaaatataca gggaaagatg ggctttggtt gaaatactgg ttattctaat tttgcctcca 102600 
ttagagtaac tttaagaagt cctttaattc acgcccaaca acaggtcagg tgctacaaga 102660 
aatgaatttg gctttgttag aagcagctta tcttgtgagt tagaatgagt tcatcaaaaa 102720 
aggctgaagt caagaataac aacaacaaca aaaaacacct taaatatttc tgttctcatt 102780 
taaactatat aaatctacat taatttatca aacttttgct ggaaagccat ggccttttct 102840 
ttaattataa gatcatttac tggagttctg attgtctttc ttgcaaaacc acacccataa 102900 
ttcatcatct atgactttca gtttggtgga gcaagaatta gaaacctgtg aagaaaatct 102960 
aagttcaaaa tcctttaaat tttttatcat ttattaacat taatcattta aacattttat 103020 
cattttcccc tttcttttac atttgtctca cccaaactta acagttattt ttaaacaact 103080 
caccattttc taaccttttg aaaacattca acttagtttt cacaggaaca aaaattattt 103140 
atgttttcat atttcattga tttgcctaat gtttaatgaa gttagataat aataataagt 103200 
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acaactggca taaacagatt tgggaccaga tactactgac atttgataag cactcaactc 1032*0 

ttcggagcga agtgtctggg caaatgagac atctttgtag tttagaatgc tcaaggagaa 103320 

aaaa^aaaat UTalt^ " tagggata tggaaaacat aggt?aatat ggtaag^g" lolllo 

aaaatggagt ttggttaaga gaactttcgt tctgtttgtc taaggtttaa tgtttaaagt 103440 

tllttotToa alttlltt 99 gtatattatg tgtgtatctt agagggcttt gSaatacS 103500 
tgatagtggg gtataattga caagggacca ttcattattc cttttcttcc cagtcctaca 103560 

llllcllTai c TclT, 9aCt tggttttccc ttaactccac agcaacttcc ta?t a ttctc SsLS 
tctoS 33 ^ l 9 t a agg 99t act 9 atggtggtgg tctccacttt ccctcttatg 103680 

aSaafttot Jt f 9 gaagtctcag ^agagggacc catgttattg taagtcctc? 103740 
atggaattgt gttccatggg attctgtgtt aattgtctgt ccacttgaat tcactttqct 103800 

«gt a t g t Ca t a t lltlltlt tC \ ttatatata tattttttct gattctctcc ctiSSS iSaSSS 
ttgttttttt gctgcactaa ctaaagcctt ttctatgttt ttgcaggctc tttcttaata 103920 

llllllltTt Sf tt9 SI ttttgcacgt aggat??tag tllTtll^l 1039 o° 

tlnt^f^ Ctta ^ gtt 9 c tttgccattt taattttgta gtcccctccc catgtaaacc 104040 
SStES I 99 ?,? 9 * 9 * ! gaccctgtt ctgtacccta agaagcgtat agc?tcttaa iSJlM 
atgattttta gagtttcaaa ttattttttc attccatttg ctaaacttgt cagttttatt 104160 
cttagactct ttcaagtttt cctcttctac attacaaagt agaatgtttc tcccttca" 104220 
tcatacagaa tctcaggttt tgttttctgt tttcctct?t a?ctg?cttc tctccgtcaa 1042B0 
SSI??? 9 g9 " tattgC "tataaaac ctgagtccat tgcagtcttg caaaSg" }J«!S 
talltT»l C tCtgt " tgt cctgtgtggt aatagagaca aatttiggcta yttggatagc 104400 
agaaa ^ attt tatatatatt ttttaacatt tatctttctc ctg^Lgggc 104460 
tgcttgtttt atctgtttat tttcttcttt tggaatagga aaaatcctgg tttqatatta 104520 

gaSaa llttllTl f CtgtCtCatt tagggaa^a .,ttg..g£ ZlllTcll 2 
gacaatataa cctaaaatgt taaaaaaaat acaacataaa cgggctggqc tcaataacte 104fian 
ItltTcTatl , tCCC3gCaCt "gggaggcc aaggcgggtg ga^acglgg JcJggagttc 10 700 
TaltrTJ * ggccaa S at ggtgaaaccc catctttact aaaaatacaa aaa?? a gcct 104760 
ggcatgatgg tgggcgcctg taatcccagc tactcaggag gctgaggcag aggatcqctt 104820 
ESSE" ggCgg39gtt gcagtga 9 cc gagatcgcgc cactlgcattc "agcc^gggc lotllo 
tatTJnll gact = tgtct cgaaaaaaaa atacataaac atatttatat at?tga??gl lotllo 
lltotocltt ta aa S 3ttt tatCCCtCaa tttggacccc attcagttta catt?cagag 105000 
Sattttott v? aa f C3aa agggttcatc agatgtgacc atgatataag aaggtggct? 105060 
S?"gcaoa tltlllT' f gggtgttgC "tgttgccc aggctagagt gtagtagcta 105120 
a a a2ccSaa cS^™! aaagaaaaaa aaatgcggcc gggcacggtg gctcacaccc 105180 
Scctoacca SaSSS gaCCgag 9 c 9 ggcggatcat gaggtcaaga gatcgagacc 105240 
nltnll? * atat ?9 t 9 aa gccccatgtc tactaaaaat agaaaaatta gctgggcgtg 105300 
9t9 cctgtagtcc cagctactca ggaggctgag gcaggagaat ^gaac? 105360 
tgggaggcgg aggttgcagt gagccaagat cgtgccacta cactccagcc tggcgacaga 105420 
ItTJTnT gt f Ctaaat aaataaataa ataaataaat aaataaagtg cggtagca? 3 10548^ 
SqaaSata IT tgaattcctg ^tcaagca gtcctccctc c??^ 9 ^ 9 105540 
cgaataggtg ggactccaga cacgtgccac catgcctagc taatttaaaa tttttttata 105600 

?cc a c a ?t 9 ?a tct^c% tat tg T aC9Ct ggCCtCaaat tcctggcctc aagtga^tt £ 
gtSSStt aatto^ 9 t9 <* gggatt a <= a 93tgtga gctaccacaa ctgaccagaa 105720 
gttggttttt aattgttcat aaataataaa ttcaaaggaa tatatccttc aaagcagtca 105780 
?gaaa aa act S""^ ^ gatgctgc tgctgttcag aacatcttga gaaatccatt "S!J 
2tt£££ cctaaa^ % ta9gaggca tacaggaaaa tatgtcttgg gattcagccc 105900 
a?tatf^ a o ^ taaattct a ttgccagct tgattaaatt gacatttctc cccagtcttg 105960 
llalclcttt CHIT" 0 f tCtCaagaa ttgccagtgt aatggctcac acctgtaata 106020 
ggcaSaa aacttat^ , ta " aagatC ac "gagccc aagagttcaa gacctgcctg 106080 
ggcaatgtag aacttgtctc taaagaaaaa aaaaaattaa ttagccgggt gtggtggctc 106140 

ZllTtl 3t tCC " 9CtaC tc WM<* gaggtgggaa gatcac??g a gcccaggagt lo«oS 
ataaacSao " 9tg f Ct ^ "Scctgggc aacagagcaa gacccttgtc LaaaaLaa 106260 
ataggccagg cgcagtggct cacacctgta atcccagcac tttgggaggc cgaggcgggt 106320 
KaaSa lllT??* 9 ttCgagacca 9C«ggcc catgg?gaaa cLcatct^t lolllo 
tTcZtr, aaaaaatta 9 ccagacttgg tggtgggcgc ctgtaatccc agctacttgg 1064 40 
gcacca? 399 - TclZTollt gCt ! aaaccc Wggcgg. ggttgcagtg agccaaga?c 1065oS 
ataaaaSa 9gg ^ gacaag ^gagactc tgtctcagaa aaataaataa 106560 

gcScScaa at™™- 3attgaatta ccacctttgg tgggtattct aacagaactt 106620 
gctgcagcaa gtccaaagca aaaaaaaaaa aaaaagaaag aaactttttt taacaaaaa^ infifi»n 

aaa t a g a t1? a a a lllllTt? t ttCCCtCaat aCa " aa9aC tttagtggag ^"t?" 10° 7 6 
ctttgtgtta accaggtttc tgcaagtgaa attttctctt tatgttaagt gaaaocttao 106800 

cctgaaaaca gaaatgtgaa gaggcttttt tagacaagtt caaaacat?t gccS^g lotllo 
ITclllT 9 IZltllT** " tCagcagt 9gtcctggga ggtgccacgt Ltg^?^" 0 ° 920 
aacttaatga aggcagcagc attgatagaa atttctgtca aatatcacca aagtcataaa 106980 
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tgccgtcact ctaagaagag gggtttattt aactttagca tggatgtatc caacctctaa 10704 0- 
aatgggacaa ataagccccg ccccttctag gtactttgac aaaaaSgga acacaatala }™JS 
agagcagacc tctggaggct gaggcaggag aatcacttga acccggj?g aggagg^g^ 
agtgagccaa gatcatgcca ccctactcca gcctgggtga caaaagcaag actc?qtc?c 107220 
aaaaaaaaaa aaaaaaaaca gatctcacaa tgatacagtc agttc?caag JgJSSt* W7m2 
cltlllllll CCCa ' t3tta S"tatgtct tLaccaaa? llltlltlll IVllTo 

ccttacacag agttttatgg acacatgaag ggtttcctcc tgtgcccagg qaocccaaaa 1074 nn 
actatagaaa ggccttcatt gtccatcatt tctgttcctg t?tgggtca? JSgtaac? 9 lOltln 
ccacacSc terete 9 " 9 ", tCtCtatC "ctgtgtc? ttgSg.^ S?c??gac S 7 20 
taga a ctaaa L ! 9 a f a 9 aca " tgagagagta ttaagtactt gaagaagact 107580 

tagatctaga agcagctatg ttagatgatc agacctcata cactgccact tqqaaaataa 107640 
ggtgtggtca tggggttcag tttcagaaat gtgttctctc tgttgatcgt t?ltgqaqqa Will 
agaaacatgg ggatgtgttt agagggagga tatgccagct acttatcaS ggaa^rtag ""S 
aaaaggcagc catagtgagg tcactatttg aaaaggagta agaaggaatg Jit^ggqcaq 107820 

ac^gtl^o ITallllT " Ctaggtga a tcaggtgag c? g tg?tcta tSgcgg^a IVllso 
actagtgatg gtggggcaga ccaggcatct tcccctccac ttgtcatagg ggtctqaqaa 107940 

at 9 ? a cacS? aqactr att gggaatcaat "gtaacact gc?tttaaa? ^agtSg? 0 0S0 
atttcacatt agacttgcag aactggaaga tttaatgtca attagagtta atqttqtttt 108060 
accagtagaa aaaaaacgat attagaacaa taacaacaaa atag?t?gtg SaJSS 
aaattaagtt tccatttgtt acataaatgt aagattaatt tttcaaa?gg IS£SS£ llllto 
taagctaaca aatggtattt gtctggctct agtgttaaca atttattc?? tttta"gaa ^82^ 
acatcagtta agaataacct tgtattcttt tatttaaaat aaggctctcg gccggca?aq 1SI300 
ctataatccc agcactttgg gaggccaagg tgggcagagc gctt'gagg?? lolleo 
aggagtttga gaccagcctg gccaacatgg tgaaacccct cctctactaa Latagaaaa 108420 

l£2StE llT " t9C 9tgCCtataa tcccagctac tcaggaggct gagSagag loltlo 
aatcgcttga acccaagagg cagaggttgc agtgagctga gattgcgcca ctqcactcca loZltn 

lltlllllll SSST aaaataaaa * l^Ultt tTattllll 

taaaataaaa taatgctctc tatacttaaa cccatctacc cagtttctac cattaatccc 108660 
ttcagatata tctgtggtgt ttctgaggcc cagcagcact ggattgaa^g "g^agcc SI72S 
SSJSE 9tgaagattg aa ^tgacaa gttaccaagc a?tcacttc? tgc^catc lolllo 
attaactgac aggagagctt acatagtctt tctttcttct ttcagtttct gtccatactc 108840 
acttatccat ttcatcttaa tccctttgat tgaccttact attt?tatgt J^JggSq lolloo 
aggcaggctt gtcaaaacaa aattaaaaaa agcacaactg gccaggcaca gtgg??ES lolleo 
TallT a tT,t C3gCaCtttg TO-Wcgag gtgggcagat catc?gaggt Lggagttcg JJSmJ 
agaccagcct gaccaacatg gtgaaaccct gtctctacta aaaatacaaa aattaaccaa lMOfio 
gcgtggtggc gggcgcctgt aatcccagct actcggaagg ctgaggcagg agaa^ac?? 109U0 
aacaaao 993 gg '? gaggtt ^agtgatcc gagattgtgc crtJLcS cagtatgggc HUol 
aacagagcaa aattctgtct caaaaaaaaa aaaaaaaaag cacaactaag tgttcc^aa 109260 

lltcllllll l a t rr tgg ggaCCtatCt tttcctccta 9 giga C taagg a llllll 

gttcctctct gttattagag tttgataaac tgaaagtgtc tgttggttgg cgtatqqttc 109380 

SSSSS S«Sr C !f aCCt9g -taacgtga ctttJgctS tKggatga' \ll\Vo 
ctctgcataa ctcccSt Slt"*^ , tCt3CCgctt 'WWctgtg tgggatagct 109500 
ctctgcataa ctcccttctt ctgaaccgag tgacacccta tggagaaaaq atctacatoa loqsfin 

ce gaSgt T.ToTtl ttggaacttc ac^atgcc 9 alSgtcag 93 

tctttc^n gtgCaggtta ca g«ggaag tggtcagcga agattccact tgttttgtcc 109680 
£2£S£c SS'? taa * tttttc attttattat acattgctaa c?cttc?gat 109740 
tctattcctg ggttctgttt actcaaaaaa tcacatgact cattgcagct 109800 
cc'Sgt 3 ? S:l C aCCatt9ttt gaagaatgtg tgattttt.. gaaaatgSS 1^9860 
ccaqcctn^ nJl 9 f ggaca 9 a 9 t 9 taatagtggt taatagccca gacccaagac 109920 
tcSScc ?caqttt^L C " tg ^ gCtgC cac ttacttc ctctatgtcc ttggagaaac 109980 
aScaS f"^ «~tateag gagaaacaac atttacctca gataaaaata 110040 
alltlll f taaggagtac tacctactct tgcacatatt atccattggg tggctgttga 110100 
"cqtatS SS? tCCttca9tt ctcacaactg ggactcgc?? t£a.!S£ "S"J 
ttcgtgtata gactctagtc acagtactct gcotgcactt catcagcacc taccctgtct 110220 
gctttcagct ggatcattgc atccagccgg ctgtcatcac caaggatgtg tgcatggtct 1102sS 
tctactcccg agatgccaag atctcaccac cacgctctct gcgtagcct? t?tgg?agcg nSLS 
tSaS gtcacca9at tcgtaagttt ttcacacaag ttagcttcca gtg?^??^ uSJJS 
acaStta^ agaat ^" a ^ccttaagt attctcggct tac^gtctca cggt?agt?t llotlo 
ttllttl ? aa ^ C T t9 gtaactqaca cctggaatgt actgacttgc atgctgaca^ 110520 
cacg^aa gga<=gagtta ^ a ^aagg aaggaata?t acattgtgaa llollo 

tat?;^ 33 acaa " tt ^ c aaagactaaa tttgcaaaga cttttaaatg tcatcatcat 110640 
tatttctaac agagtaccat tgaaattttc ttctttttaa gatttgttat qcatatttaa 1107nn 
ggcatacaaa atatgtcagg aataatacag tgaaggccat acacagtggc Ictlgtcllt HoZ 
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aatcccagca cgttgggagg ctgaggacag agaattgctt aagcctagga gtttqaqacc 110850 
agcctgggca acacagaccc catctccaca gaaaatttta ..Jtt.cS gg^gg'ga llolto 
aaattlltll ! agtc = Cagc tactcaggag gctgaggtgg gaggatcact ?gagc??ggg iiSS«S 
lllccltTc tcaaaa?t?a -actgcact ccagcctggg tgaaacag?g 111000 

tlttttlllt * caaaat " a aaaaaaagaa aagaaaaagg aaaaaaaaaa aaaagaaaga 111060 
aacaatacaa tgaatacctg tgtatcaccg cccagcttaa aaataagtgt tactaataca 111120 

itsssz iitttttiti ta % tac , tc : c 1 0 ° 

tt£»£*£ ? lt~ tC atatc 9 tctc tcttatagta agaaaagaca aatttttagt 111240 
Katotattt T I*** ttttctagtt tatcattcaa ttttattcac cctattgaaa 111300 
ac a t?t?o^ ac aa Ioo a9a gatga " tgg W-tggtgg cttatgccta taatctcagc 111360 
gccaaggtgg aaggatcact tgaggccagg agttcaagac cagcctgggc 111420 
aacataagga gaccctatct ctacaaaaaa acttaaaaat tagctgggta tagtgqtcta 111480 
cSSS"* Caggaggctg aggcagaagg atLca??ag tcSggagt? JuJJJ 

3 ^ taCat , CaC tgCactccag -tgggtga? acagggaaac luJJJ 
cSaqg^ca taal- att g a 9^ga aagcggactc ctctaggaaa aggtatgcta 111660 
aaataagaaa agagcggtga caaataagaa atgcctggaa ttatatttga 111720 
? agaagacact attttaagta gcagaaattg ...tttSct gtttttgtgc 111780 
tttaagagat taaacaattg ttttattttc ttcaaatagg aatcgagtca ctgqcattta 111840 
cgaactcagc ttatgcaaaa tgtcagacac aggtagtcca ggtaag^tc? SggaJtaa lu900 
ggaggtgata gttatctttg tgtatgtttc aagta?cctg tZtttJogct gggclcggjg limS 
aagt'ca^ tTnlT^ CgCtttggga ^ccaaggcg ggtgga?cac Sgagtccag 112020 
gagttcaaga gcagcctgac caacatggtg aagcccccct ttctaccaaa aatacaaaaa 112080 
ttagccaggt gtggtggcgc gtgcctgtaa tcccagctac tcaggaggct gaggcatgag llluo 
gccSqS lltltr" 99 caga ^ttac agtgagccga gat?gtgcca ctgSgca 22oS 
gcctaggtga cagagtgaga ctcagtctca aaaaagaaaa agaaaaaaaa atcaaattat 112260 
taagtcagat ttgttctgta ttctttttaa tgtgttttt. aactgctct? gttct?"^ llltto 
rtJSSSS IT 9 ^ Cttagcattt ctatgtttgt aaaccgaaac atgaacaaaa U23sS 
S aa " f 3ata9Ca "^attaat gtgagtttga gtcattttac agaatataca 112440 
al? a c aa ^ a gtcagggccc acaggcaaga cttagataag taaagaagac t?tgagta?g ll25o2 
"atataaat KSSSJ a "*J act,,t cca ^agtt tcttgaggaa ttt?t?t..? 112560 
It?*?*?*** tCa ^ gtattt "tatgtaaa accagcataa tactaaaaaa aattaacagg 112620 
aaataataga ggctaaagtt ctctgtcatc tgtcagccag tgagctactg agtqtqtqta 112680 

titiiititi titittiii: c9tctttt t a r attttta t?t?taggga cSsns: ""!; 

tccctttttt tttttttttg agacagagtc tcgctgtgtc acccaggctg gaqtataata 112800 

cSccSS ItllTaTtt a3CCtCtgCC tcctgggttc aagtga^tc? S^SSS 
cctcctgagt agctgggttt gcaagcaccc accaccatgc ccagctaatt tttgtatttt 112920 
tllaalX ggggt " Cac tatgttggcc aggctggtct cgaactcctg accLaga^g 
cc a ?aqt a tJ llatrlT aaagtgCtgg ^ttacaggc gtgagccatc acgcccggc? 1 3040 
atS a 2?a a tSc^i 9 , Caga f attt "tatttctg cattttcacc taacatcata 113100 
tttcttaaat tgcatcccag tcttcataat tatcaaagaa tqtaaaactc mun 
agtttttaaa aacgttcagt tgctttgcta tttaaacaaa atccca^ta t?ttcaqacc 11322^ 

'gee ITtlll JSSS? f ataa9aat tatta " tCg tggccaggcg" cggtggSca C uLI 
cgcctgtaat tccagcactt tgggaggctg aggcaggtgg atcacgaggt caqqaqatca 113^0 

IZllltT* 9gCtaaCaCa Stgaaacccg gtctctacta aaaatacaaa aSagctg U3400 
llZrl 9 , " Cgggcacctg tagtctcagc tactcgggag gctgaggcag gagaat^gca 113460 
899 aggcggagct tgcagtgggc tgagatcacg ccactgcac? ccagcc??gg 11352S 
caacagagcg agactccatc tcaaaaaaaa aaaaaaaaga attattaatt cgttttqacc 113580 
aTallll 9 3 attgaatttt ttttctgaaa caaaaacttg tttcctcttg tSaggtat U36 4 S 
gcagagaagg agaagaaaaa tcttagatac gtcagtggca tatgtgcggg gagaaSgaa 113700 
cttagcaggc tggcggcccc gtggagacag cctcatcctt gagcaccagt gggagctqqa U37eS 
gaagctggag ctcctacatg aggtatccag gggcagggtt gt?cagatgc lagaactSc 13820 
a? a tt a ?t a c ITaTaTl " gtCttCCt ccccgctgct gcatgtVc ata'gcttt^c 3 8 S 
Itllr. t cctgaggtct taacgagctt tgtgtttgct atagcagtag tattgatctt 11394 0 
^ aaaaacccgc cactttttgc tgctgcgtga gagact^gg? gacagcatcc 114o2S 
ccaq'atctc ^^ 9aC ^ 9 " atccccca ^cctcagcag tgggaccctc Igcacctcca 114060 
atC ^ C f ct "gatc tcaaccacta cctttgaaag cgccatcaca cctagcgaga 114120 
aa a cc^ 9 l a ga , Ca , tC9aaa gcctggtg * a ccgagagaaa gagct'ggc?: 80 
ctc a cqq?t? IttS^ t acgtcttccc acaaggctcc acaaactagc 114240 

tatS^ attcattttc aacacctttg ttcgaggtgt ttgaaggcct gtgataatac 114300 
ITJr,, f 3 gaaataaaaa gacgcagttc ctaccctcaa caagtttaca gggccaggca 114360 
cc«qqaa? a HT ***** CCCacCactt tgggaggctg aagcaaaggg ^t'gctt'gag 42S 
SSttloaS! aaagacaagc ^aggctaac atagtgagac cttgtctcta caaaaacLa 114480 
aattaggcca ggcatggtgg ctcacacctg taatcccagc actttgcagg gctgaggcag 114540 
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gtggattacc tgaggtctgg agttcgagac cagcctggcc aacatggtga aaccctgtct 114600 
ctactaaaaa tacaaaaatt agccaggcgt ggtggcgagc gcctgtaatc ccagctactc 114660 
aggaggctga ggcaggagaa ttgcttgaac ccaggaggcg gagattgcag tgagccgaga 114720 
tagctccact gtgctccagc ctgagcgaca gagtgacact ccatctcaaa aaataaaata 114780 
aaataataaa aaattaaaaa attaaaaaat ataaaggttg cagaatatta ggtcacaagc 114840 
atataataga gatgtgtgtt tcaatgctat gggagtcaga ggagaagtgt ctatggtggg 114900 
gagctcatga gctgaccttg aaaggatgcg caggagggag gcagcagtgt cgacacctgc 114960 
gtgatcagga gggaggcagt ggtgtcctgc gtgatcagga gggaggcagc ggtgtcttgc 115020 
gtgatcagga gggaggcagc ggtgtcctgc gtgatcagga gggaggcagt ggtgtcctgc 115080 
atgatcagtc ttggggaatt cacgtgggac tggaacagga ggtgtggagt gaagaatctt 115140 
gtatactgca ctgaagagtt tagagtggat gatgggagca aggcagaaat atgttaaaca 115200 
ggtatttgtg ttttagaaag atttctggcg aggggcggtg gctcacgcct gtaatcccac 115260 
cactttgaga ggccgaggca ggtggatcac aaggtcagga gattgagacc atcctggcca 115320 
acatggtgaa accccgtgtc tactaaaaat acaaaaatta gccgaacgtg gtggcgtgcg 115380 
cctgtagtcc cagctactaa ggaggctgag gcaggagaat cgcttgaacc caggtggcgg 115440 
aggctgcagt gagccgagat cgcgccactg cactccagcc tgggcaacag agtgagacac 115500 
catctcaaaa aatatatata tttctctggc agcaataaag agaatggatt gatggtgaag 115560 
agatacactg aggaagggag aattctttat ctttattaca gtaatctgat gagaaatgat 115620 
aaaagctttt taaaatttta ttttttttga gatggagtct cactcctgtc acacaggctg 115680 
ggatgcagtg gcatgatctt ggctcactgc aacctccact tcccaggttc aagcgattct 115740 
ccttcctcag cctcccaagt agcttggagt acaggcatgt gccaccacac ctggctaatt 115800 
tttgtatttt tagtagagac agagtttcac catgttggcc aggctggtct tgaactcctg 115860 
acctcaggtg atccacccgc ctcagcctcc caaagtgcta ggattacagg tgtgtggacg 115920 
cccagccaga taaaagtttt ttaaagaagg cagaggtggc tgggtgcggt ggcacacgcc 115980 
tgtaatccca gcactttggg aggctgaggt gggcggattg tgaggtcaag agatcactat 116040 
gatgttttta ttttttattt ttatctttat taaatttttt ttttttttga gatggagtct 116100 
tgctctgtgg cccacgctgg agtgcagtgg tgcgatctcg gctcactgca gccttcactt 116160 
cccaggttca agcggttctt ctgccttagc ctcctgagta gctgggatta caggtgtgca 116220 
ccaccatgcc cagctcattt tttgtaggtg ttttttttgt ttgtttgttt gttttttgga 116280 
gaggggatgg agtcttactc tgctgccagg ctagagtgca gtggtatgat ctcagctcac 116340 
tgcaacctcc acttcccggg ttcaagtgat tctccttcct cagcctcccg agtagctggg 116400 
attacaggtg cccaccatga cgcccagcta attttcttat ttttagtagt gatggggttt 116460 
caccatgttg gccaggatga tctcgatctc ctgacctcgt gatccacctg cctcagcctc 116520 
ccaaagtgct gggattacag gtatgagcca tcacacccag ccagaaccat ggtgttttta 116580 
gtaattcact ttggtaatct tgagaaaggt ttttttaagc cacttgcttt atctttatta 11664 0 
gacatacata gaacttccct tactatatac ttagttctaa tttctagagc tgtttgggca 116700 
acatcggagc tgcactgtgg gtttaggaca tgttatttcc cctcctgcct tgtatccagg 116760 
gcttcattta gtcattgagg tttttgtttg tttgttttta aatttactga tttattagaa 116820 
aggacagtgc aaaggagacc aatgaacggc agatggacga atgcataggg agaggagatg 116880 
tggagcttcc atgccctccc cgggaactca gcccttcagg agcctccacg tggtcagcta 116940 
tcccaaggct cctggctgtt gacattttat gttcagcttg cacttctctt tctcttcttt 117000 
agtgcctgca acttctcacc cacactttca acagagaatt cagccaggtg cacggcagcg 117060 
tcagtgactg taaggtgagc acattgactg taatttttag ccagtatgtt gataactgat 117120 
ttctccacag cagcccagat tacctattcc tggtttttgc tgcttttaag caacttgtca 117180 
tgggcatagc attgtatttg aaaatttatg acatactgct ctggtattca ttctaatttt 117240 
tcagagtccg aacactgact tctgaagata aaagtactcc tttgtgtctc ttagagtgat 117300 
tatcagatgg gaaacatttt ggctttttca tgactccttt ggaggagaat attctatggg 117360 
gaggtggtat gttattcttt gccagggtac aaggaaaccc tgaggttcct ggtggcataa 117420 
agttttattr acttcaacaa agagtgaagt aaacacttca gagaatttct gtgttattca 117480 
ctcaattcta gtcagcttga ccttaaaccc tcccggctca ttcctgcctt ggggtctttg 117540 
cacacctact atttctttgc ctggaattgc ctttctccag gtattgccat ggttggctcc 117600 
cttacctctt tcatttttct actccattat cacctcctgt gagtcgactc tgttaacacc 117660 
tgataataaa tctgacagct gtctgaaact ttctacctac ccctctttcc tgctttattt 117720 
tttgctatag cactaattac tacctgatgc tttttaaagt gtttgttgag tggtttattg 117780 
tctgtcttaa gctccaggga ggcagatcct ctgccatatt attagaacaa tgcctagtac 117840 
atggaaggtg caccactgta ataccactta tattaattca ggggctgggt gtggtggtgc 117900 
atacctgtaa tcccagcact ttgggaggcc aagatgaggg gatcgcttga ggccaatagg 117960 
ttgagaccag cctgggcaat atagcacaaa taccctatct ctacaaaaaa atttttgaaa 118020 
aattagccaa atgtggtggt acacacctat agtcctagct actcaggagg ccaaggcagg 118080 
attgcctgag cccaggagtt tgaggctgca gtgaggtatg attgtgccat tgcaccccag 118140 
cctgtgtgac agtatttttt tgtctcttaa tgataacgga ctggggggct ggctgcggtg 118200 
gctcatgcct gtaatcccaa cactttggga ggccaaggcg ggaagatcac ctgaggtcgg 118260 
gagtttgcga ccagcctgac caacatggag aaaccccatc tctactaaaa atacaaaatt 118320 
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agccggatat ggtggcgcat gcctgtaata ccagctactc aggaggctga ggcaggagaa 118380 
tcgcttgaac ctgggaggtg gaggttgcag tgagccaaga tcgtgccatt gcactccagc 118440 
ctgggcaaca agagtgaaac tccgcctcaa aaagaaaaca aaaaacaaaa aaacccccaa 118500 
aataatgggc caggcatggc ggctcatgcc tgtaatccca acactttggg aggccgaggt 118560 
gggcggatca cctgaggttg ggagtttgag accagcctga ccaacatgga gaaaccctgt 118620 
ctctgctaaa aatacaaaat tagccgggcg tggtggcaca tgcctgtaat cccagctact 118680 
caggaggctg tggcagaatt gcttgaacct gggaggtgga ggttgcggtg agccaagatc 118740 
gcaccactgc actccagtct gggtgacaga gcgagacttc gtctctccaa aaataaataa 118800 
agataatgca ggagaggcca gtgacagtgg ctcacatctg taatcctgcc agcttgggag 118860 
gtcaaggtcg atggatctct tgaggccaag agttcaagac gagcctgggc aacatgacaa 118920 
aaaccccatc tccacaaaaa aatacaaaaa ttagttgggt gtggtggtgt acacctgtag 118980 
tcccagctac tcaggaagct gagatgggag gatcacttga gtcctggagg ctgaggctac 119040 
agtgagccaa catcgcacca ctgcactaca gcctggtaga catagcaaga ccttatctcg 119100 
aaaaagaaaa taataataat aataataaga agaagaagaa gaagaagaag acatagcaag 119160 
accttatctc gagaaagaaa ataataataa taataataat aataatgaca acaataataa 119220 
taatgcagga gggcagtagg aacataaaaa accaaccttg ccccttttat aaatgtcttg 119280 
attcctggtg tggcacttag gccgtaaaat atttgtaggc ttgttttttt tcttttttga 119340 
aaatattttc accagttgtg gtggtacacg cttgtagtcc caggtactca ggaagccagg 119400 
gcacaggagg atcgctggag cccaggagtt ggaggccaca ggaagctagg attgcatcac 119460 
tgccctccag cctgggcaat agagcaaggc cttgtctctt aaaattttac ctatgtattt 119520 
atttattgat aaattatgta taggcttgtt ttcttaaatg accttctgaa acaagatgta 119580 
tcaaagtgag gttctggttg ttgaaactcc ctattgttac tgtagaggca gcttttttca 119640 
gctgaatgta acttgtagtg ttcggtttgc ttccagttgt ctgatatctc tccaattgga 119700 
cgggatccct ctgagtccag tttcagcagt gccaccctca ctccctcctc cacctgtccc 119760 
tctctggtag actctaggag caactctctg gatcagaagt aagtacccag atttcactga 119820- 
gagaagtcaa tctaagaacc aaggtaaatg tcaaccttcc tctagctcaa tggttcttga 119880 
ctgaaggcag caccttctcc ctaggcagca tttggaaatg tgtgggagcc tttttggcta 119940 
tcacagtgac tggagaggtg ctagctactg gcatctcttc acatcatagc ttactaacct 120000 
tctgcatcat caactcatca tgactgtctt cttcatccct ttctgctagt tttatcctcc 120060 
attcctaccc ctgcttgtct ccccagtcat aataccagac ataggaagtg ctatcttggg 120120 
aaaaatattt tgaaactgtt ttggaaaaaa atattttgat tccacagcaa attatgtagg 120180 
taagaaaaat aaagacaagg ccaggtgagg tggctcccag cactttcgga tgccaaggca 120240 
ggaggattgc ttgaggctgg gagttcgaga ccaggctggg caacatagcg atatgctgtc 120300 
tctacaaaaa ataaataaga tacccaggcg tggtggtctg tgcctgtggt cccagctact 120360 
cgggaggctg aagtgggagg atcgcttgaa tccaggagtt caaggttgca gtgaggtttt 120420 
attattgcac cactgcattc caccctgggc gacagagcaa gaccctgtcc ctaaaaaaaa 120480 
taaaaaaaag acaaatgtag tctagagtcc atgattcatc atcataatcc gttctttgca 120540 
acacaccctc ttgtgcatct ccatcaatta tgctcacctg gttatcatcc acctgtctat 120600 
cttattgtac caacactgac acaggcaaag gtggctggag aaaataacac aaccaggtaa 120660 
tttcattaaa tttaggatgc cacagactgt agggactatt ctgtgccact aagagaagaa 120720 
gaaaaagcac tgacaactaa cttaagcatg ccatcaatta taagatgaat cccaatttaa 120780 
gaaatggtgg ctgagtgcag tggctcatac ctgtaatccc agcactttca ggggctgagg 120840 
tgggaggatt gcttcagccc acaagtttga gaccagcctg ggcaacatag ggagaccccc 120900 
atctctacaa aaaataaata aaatagttgg acatgatggc atgcgcctgc ggttccagct 120960 
actcaggagg ctcacatggg aggattgctt gagcccggga ggttgaggct gcaatgagcc 121020 
atggttgcac cattgcactc cagcctgggt gacagagcaa gaaatcagcc aggcataggg 121080 
gcatgtacct gtaatcccag ctactgggga ggctgaggca ggaagattgg ttgaacctgg 121140 
gaggcggagg ttgcagtgag cc 121162 

<210> 2 

<211> 10884 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> exon 
<222> 1601.. 1750 
<223> exon 



<220> 

<221> exon 
<222> 2139.. 2331 
<223> exon 
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<220> 

<221> exon 
<222> 2540. 
<223> exon 



.2658 



<220> 

<221> exon 

<222> 3830.. 8884 

<223> exon 

<220> 

<221> misc_feature 

<222> 8885. .10884 

<223> 3 'regulatory region 

<220> 

<221> allele 
<222> 2397 
<223> 12-602-196 

<220> 

<221> allele 
<222> 2551 
<223> 12-602-350 

<220> 

<221> allele 
<222> 4500 
<223> 12-587-379 

<220> 

<221> allele 
<222> 6119 
<223> 12-596-124 

<220> 

<221> allele 

<222> 10130 

<223> 12-808-52 

<220> 

<221> allele 

<222> 10153 

<223> 12-808-75 

<220> 

<221> primer bind 
<222> 2203.. 2221 
<223> 12-602. pu 

<220> 

<221> primerjoind 

<222> 2600. .2620 

<223> 12-602. rp complement 

<220> 

<221> primer_bind 
<222> 4479. .4499 
<223> 12-587. rp 

<220> 

<221> primer bind 



polymorphic base C or T 



polymorphic base A or C 



polymorphic base A or C 



polymorphic base A or G 



polymorphic base A or G 



polymorphic base G or C 
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<222> 4858. .4878 

<223> 12-587. pu complement 

<220> 

<221> primer_bind 
<222> 5996. .6015 
<223> 12-596. pu 

<220> 

<221> primer_bind 

<222> 6423.. 6443 

<223> 12-596. rp complement 

<220> 

<221> primer Jbind 
<222> 10079. .10098 
<223> 12-808. pu 

<220> 

<221> prime rjbind 

<222> 10523. .10543 

<223> 12-808. rp complement 

<220> 

<221> prime r_bind 
<222> 2378.. 2396 
<223> 12-602-196. mis 

<220> 

<221> primer_bind 

<222> 2398. .2416 

<223> 12-602-196. mis complement 

<220> 

<221> primer_bind 
<222> 2532. .2550 
<223> 12-602-350. mis 

<220> 

<221> primer_bind 

<222> 2552. .2570 

<223> 12-602-350. mis complement 

<220> 

<221> primer_bind 
<222> 4481. .4499 
<223> 12-587-379. mis 

<220> 

<221> primerjbind 

<222> 4501. .4519 

<223> 12-587-379. mis complement 

<220> 

<221> primer_bind 
<222> 6100. .6118 
<223> 12-596-124. mis 

<220> 

<221> primerjbind 

<222> 6120.. 6138 

<223> 12-596-124. mis complement 
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<220> 

<221> prime r_bind 
<222> 10111. .10129 
<223> 12-808-52. mis 



<220> 

<221> priraer_bind 

<222> 10131.. 10149 

<223> 12-808-52. mis complement 

<220> 

<221> primer_bind 
<222> 10134.. 10152 
<223> 12-808-75. mis 

<220> 

<221> primer_bind 

<222> 10154.. 10172 

<223> 12-808-75. mis complement 

<220> 

<221> miscjainding 

<222> 2385. .2409 

<223> 12-602-196. probe 

<220> 

<221> misc_binding 
<222> 2539. .2563 
<223> 12-602-350. probe 

<220> 

<221> misc_binding 
<222> 4488. .4512 
<223> 12-587-379. probe 

<220> 

<221> misc_binding 
<222> 6107. .6131 
<223> 12-596-124. probe 

<220> 

<221> misc_binding 
<222> 10118.. 10142 
<223> 12-808-52. probe 

<220> 

<221> misc_binding 
<222> 10141.. 10165 
<223> 12-808-75. probe 



<400> 2 

gcctcccagg ttcacgccat tctcctgctt cagcctcccg agtagctggg actacaggtg 60 

cccgccacca cacccagcta attttttatt ttattttatt tatttattta cttttgagat 120 

ggagttttgc tctcgttgcc cacgccagag tgtagtggtg caatcttggc tcactgcaac 180 

ctccaccttc cagcttcaag tgattctcct gcctcagcct ccaaagtagc tgggattaca 240 

ggcacccgcc accatgccca gctatttttt tttttatttt tagtagagat cgggtttcac 300 

cattttggtc aggctagtct caaactgctg accttatgat ccacccgcct cgacctccca 360 

aagtgctggg attacaggcg tgagccactg cacccggcca cacccagcta attttttgta 420 

tttttagtag agatggggtt tcaccgtgtt agccaggatg gtcttggtct cctgacctca 480 

tgatctgcca gcctcagcct cccaaagtgc tgggattaca ggcgtgagcc actgcgccgg 54 0 

cctttttttt ttttttttga gacagagtct cactctgtcg cccaggctgg agtgctgcaa 600 

cctctgcctc ccgggttcaa gtaattctcg tgcctcagcc tcctgggtag ctgggattac 660 

aggggcctgc caccacgccc agctaatttt tgtattttta gtagagacgg ggtttcgcca 720 
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tgttggtcag 
aagtgctggg 
aagaatgaag 
atgttactct 
gtcaagaggg 
ggaaccctgg 
tgagacttca 
cccacccaaa 
acagtttccc 
ttggtgtggc 
atgtgttcag 
tcaatttcct 
aagccagctt 
gagggccatt 
ctcttgttac 
gggcctctag 
catatttggc 
ttagaccaag 
gaagaagtga 
actgtagtct 
ggtctcataa 
ccgtattact 
tctaaggact 
ttgtgagaaa 
taccttcatt 
cggccttatg 
ctgtccacag 
gccctgcctt 
ttacgtggtg 
atgccttggg 
acctgcattt 
ccttttgcag 
tctagctggc 
tctcactttt 
ctggttcaag 
accattgcct 
gctggtctca 
attacaggcc 
tggcttttgg 
ttcttttctt 
gcccaggctg 
aagccattct 
ccagctaatt 
tggtcttgaa 
tacaagcgtg 
gagaggaagg 
aaggcattgt 
tttccttctt 
aagggaggac 
cgttctgagg 
gtgccaaata 
agacttgtac 
cccgcagatg 
ttcgagagat 
tgtatgtaat 
ccgttcccca 
agtagcatgt 
cacacacaca 
aattattgtg 
tgatgtctgt 
cttttctctg 
agacatctgg 
ctttttgcca 
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gctggtctca 
attacaggtg 
tatgagaaaa 
acaaagatat 
ctggtccagg 
gtcttcccat 
gtgtacatca 
gtttctcgtt 
aggtgaagct 
catgggacta 
agggctttaa 
ttcataactt 
gtctcctcag 
gtgttcctcc 
ccttttcttt 
tccctgccca 
ccgagcagga 
gtgagtacta 
ctggccagct 
gatgcaagtt 
tgcatctgca 
ttggcagata 
tggcctaagt 
ctaacttttg 
tcaaggagcc 
tcttcatcta 



cacaggtgga 
ggtttcttat 
tgaaatttcc 
aatatggcaa 
ttctttcaga 
gccctcaatg 
acaatacggt 
tctccctgaa 
cgattctcct 
ggctaatttt 
aactcctgac 
tgagccactg 
cttcaggaat 
gttttttttt 
gagtgcagtg 
cctgcctcag 
tttttttttg 
ctcctgacct 
agccaccaca 
acaggagtgt 
catgttctga 
tgtctttttg 
agtgagaacc 
gcttgtgtca 
ggaagatgct 
ccagacttct 
cccgagccag 
aaagaaagcg 
cctgtggctt 
tctccattgc 
ggcctaacaa 
catacacaga 
ggtgtacttt 
gctgctgtca 
tttgtgatat 
cctttttaga 
tggctaatcc 



aactcctgac 
tgagccactg 
agttaaaatg 
ctagcaatac 
tccaggtcag 
ttgtttgctt 
tcatcccctg 
ccgtaggtct 
gatgctgctg 
aaaagaaaca 
tataggcaga 
cagaaggatg 
ctgagctctc 
cagtgaaaca 
ctaatctctc 
gaatttgaac 
aaaaacgaat 
tattgagcag 
cttccctgtg 
gagtctgggg 
ccctggcgca 
agattctagg 
gtggcgtatt 
tcatattgtc 
tctttacagt 
taacagtgac 
gtacagtgag 
tgccacgtgt 
ctattctctc 
ggcagtcccc 
caccaaacac 
acaaagacat 
aagaagtttt 
gtgcagtggc 
gactcagcct 
cgtatttttt 
ctcaagtaat 
cacccagcct 
atcaggataa 
tttttttttt 
gcatgatatc 
tctcctgagt 
tatttttagt 
caggtgatcc 
tctggcttga 
aaattggttg 
accaccaggg 
ttctcaagag 
ttaatggtct 
cagggtcccg 
ccaggaggga 
aatcctacct 
tcgaaatact 
ttacctctca 
aactacttct 
tctgtactct 
aaggaaaaaa 
caaaaacaca 
ggcttccttt 
ccaggcacct 
cactttaatt 
acctgagagg 
ctgcatttcc 



ctcaggtgat 
cacctggcta 
ggctttttag 
agagtatagg 
tggtgcaggt 
tttctgccat 
ggaggcttgt 
ggcttggagt 
gctgggagca 
tataaaggag 
gagattgtgt 
acaccagttg 
tgcaactcaa 
gtactcagta 
tattttaaag 
agtttcagat 
ttctcaatct 
gaatgccagc 
aggtctgtac 
tgagagggcc 
gcatgttgcc 
ccaaataggt 
ttgatggtcc 
gtttttagct 
aactgggcta 
aaagaccctg 
gaccagcagg 
gcccttctct 
tggttttgaa 
attgctgtct 
mtttgctgtc 
gaacgactgg 
gttgttgttg 
ttggtcttgg 
ccccagtagc 
gtagagacgg 
ccacccacct 
agaagttgtt 
ataggaaagg 
tttttggaat 
ggctcaccgc 
agctaggatt 
agagatgggg 
accctcctcg 
aaggaagtat 
tggggtggta 
atactctggt 
ttctaaagat 
cccaacctcc 
gagtagacgg 
gcctcagagg 
ccccgtttgt 
aagtgactct 
tttctctttg 
ccctccttgt 
tttctttttt 
tgtttttaaa 
aaaactctga 
tgtatgatag 
ttgtttttca 
tttcttgggt 
aaaaaaagag 
attcagggaa 



ctgcctacct 
acttgatttc 
atttagtgag 
aatttatttc 
cttctctgca 
tttctaaggc 
tgaagccccc 
ctaaaaattt 
cactttgaga 
tagagttcct 
gacccagtgg 
aacgcagagt 
gtttgcagct 
tgccttgatt 
gaccccagaa 
tgtcccagct 
tgttccagat 
tataaaaaac 
tgtagtctga 
agcctgggtt 
atggcacggt 
catttgcctg 
tcagcacgat 
cagtggtctc 
aacattttgt 
tggagcgtgg 
ccatggtgaa 
tttgatttct 
cttcctttgt 
ctgtagtaac 
tgcacaaagc 
ttgtatgcct 
ttgttgtttt 
ctcactgcaa 
tgagactaca 
ggttttacca 
cagcctccca 
ctgtgttttc 
aagaattttt 
ggagtttttg 
aacctccacc 
acaggtgccc 
tttcaccatg 
gcctcccaaa 
ttttcaaaat 
atgaggcctc 
cacacatgtg 
ggggctcagg 
agccctgtga 
cttagaggac 
gagaggactg 
cccccatagg 
gccgagtgcc 
tgattcttga 
ccagcacttt 
cttgtgctga 
cacacacaca 
ggggatctgg 
gtccccatca 
agacaacata 
ggcttagaga 
tctttttttc 
aaggtggtag 



cggcctccca 
aaaaaaaaaa 
atacaatgct 
tcacacagta 
cagccgttca 
agtgggcctc 
atggctcaac 
gagtttttga 
aacatcgctt 
attccagata 
ccatcaatga 
taactttctg 
ggggtttagg 
gtaactgatt 
gccaattccc 
gtggaaacac 
attgaagaaa 
aaatccacag 
tgcaatctgt 
tctccagttg 
ttcgcttttg 
tttcatcgaa 
tattttcttt 
taagaaagga 
tgtcgtccgt 
aatcattaac 
ggtccgtcct 
gcactcyctg 
gaagtgctat 
tttcttgtct 
accgtggggt 
tcaacccact 
tgagacggag 
cctccgcccc 
ggcacgtgcc 
caatggccag 
aagtgctggg 
tttctctctt 
cttttctttt 
ctcttgtgtg 
tcccaggttc 
gccaccacac 
ttggccaggc 
gtgctgggat 
aaatttaaaa 
tgcatcttgt 
caccaattat 
ggagagagta 
ttcagcctca 
taggcctgcg 
tgaggggaca 
tcaaagcttt 
ctcactcgcc 
cggtgactct 
tctagctctc 
gaatctcgtt 
cacacacaca 
tgaatctcca 
tgaccacctc 
cttttttttt 
ctaagggagg 
ccctctgtct 
tgagcatagm 



780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
4440 
4500 
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actgcaacag ttatattctg 
ttatttactg gtagagagga 
cttgactttg tgtgaaggga 
ggtttggccg tgggccgtga 
gattcctgcc cactgttttt 
ttctgctttt ggcattagtg 
aaatgttcca caaggcggga 
tgattaagat aatgccgggc 
ggggaggccg aggcgggcag 
atgaaacccc gtctctacta 
attccagctg cttgggagcc 
cagtgagcca agatcgcgct 
agaaaaaaaa aataataata 
atcaaagaaa tatagctttc 
ttttcattga tgcttgtcag 
aagcaaatgt ggcaggcttt 



ctccaaacaa aacacaatat 
aggctttgtt gcgtgttccc 
aaggaggaag ctcactgtgg 
ccgagatcat cagagaagat 
agaggggaaa aagaaagcct 
tttacaccat cctttacttc 
caggaaagta tctgctcact 
aggcgttagg cagctggatg 
ctctgcagag ttgttgctag 
ctggttgggc tcatctctga 
gaagaggagg aaaagtgaat 
ctgcacctga aaagtgaccc 
tcctgttaaa acataggtca 
ttttctaatg tagttagaag 
ttcaaaccag caggatctgt 
atttcaggga aatgaagatg 
ctgttacaga gtgtgacaga 
gagagtgaaa ttatgtgata 
ctggtttgga aagtctttgc 
tctgttcata gcgtgcctga 
cgtgtcctgc gtgctcttca 
gcctttgctg gtccccagca 
cactgtgagg ctgtgagaga 
ttgtggcggg aaagttaaga 
aagcacgcct ttaaataagc 
tcagagaaat aagtgctagt 
ctcccatcaa agcaaaatat 
tgtgtctgtg tgccctcctg 
ccagcccatg cgctagtcag 
tgagccccag ctcgttgtta 
ctaagaaacc actattatat 
tattttttat ggatttttca 
tcttgagatc taaaaacaag 
tgcctgctgc ttattgtcta 
tatccagcta tacttacttg 
tcatcggtac tctttctgca 
tatcagggct tgtttgactt 
gattgttaac ctgcctcttg 
agcacattgg gaggccaagg 
gccaacatgg tgaaaccccg 
tgtgcctgta gtcccagcta 
gtggaggctg cagtgagttg 
actccgtctc aaaaaaaaag 
atgtgtttac ataggactct 
caccccagtg tgtggccaga 
tagtaaattg aaatatcagc 
gaagttagta tttgagcccc 



agtcaaagtt 
gacgagaggc 
aatgccaagg 
tggcagcagg 
tctctgttgg 
atggtggtgg 
gtgtttcact 
aggccgggca 
atcacgagat 
aaaatacaaa 
tgaggcagga 
attgcactcc 
atgctgggta 
aggcataaac 
gttgaagatg 
gctttctgga 
gcctaggggc 
tgctcctctc 
acagtcttct 
aagtctgtct 
ccagtgactt 
ccttgctcag 
cccacttggt 
acttcccgct 
ggagacttgt 
agaacaggtc 
cacggagaga 
gcggaaactc 
ctaactgtga 
tttcattgtc 
cggtgcttag 
gaatttgaag 
ggatccatgt 
cactgaaatg 
tttggaagcg 
ataagaaggc 
ccctctgggg 
cgtccagggt 
tttgtggcag 
aacatagccc 
aaaaacttta 
taatactaat 
ggcctcacca 
ggagtcagtc 
gagcacaggc 
aacgtgctga 
attttttccc 
gctcatttca 
atgacttttc 
atttacaggg 
cacagtggat 
tttccctcgt 
aggtctttca 
aaagattcaa 
caggtggatc 
tctctactaa 
cttgggaggc 
agactgcacc 
aaaagattca 
aacttgtgtg 
ccatgactgt 
tgagagatta 
ataagatctt 
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ggggcttttt 
tttttcagtg 
atgcttctgg 
cggtgggatg 
gagggaagct 
tacagttgga 
ttctggtgat 
cagtggctca 
caggagttca 
attagtcggg 
gaactgcttg 
agcctgggtg 
gtgaccttgt 
ctggaagtct 
ctttcagtga 
tcccaggatt 
acggatgaac 
catctggtgt 
ttccttctga 
ctttcagctg 
cagttgcttt 
acctctctgt 
gagtcctcgg 
tcacgcagca 
gtcatcatcc 
tcccagcttc 
gaaaggaaag 
tatggcggat 
tgttatttgt 
tgatggacac 
agatggctgc 
gtcactttta 
ctgtgacaca 
acttttgttt 
tcagacatta 
ctcttaggga 
cgcccctgct 
gggtgctccc 
gaactgttta 
ttaaggaaac 
aaaggcagga 
cacctcctcc 



ccagccccaa 
agcgctcagg 
aaggggtgct 
cggcaagggg 
ttcagtcaca 
gatgaaggaa 
ctggcacata 
atatttaatt 
tggagagaaa 
gctgtgtccc 
gttttccttt 
ccgggtgtgg 
acctgaggtc 
aaatacaaaa 
tgaggcagga 
attgcactcc 
tgatgctgct 
cactacagtt 
gtagcaggaa 
tttgctgctg 
taaaaagcct 



acggcataat 
ggcctgggac 
tggacttcag 
cttgtagctc 
cttttctagg 
attagtgcca 
aaacttgatg 
cgcctgtaat 
agaccagcct 
tatggtggca 
aacccaggag 
acagagcaag 
gattgttaca 
ccctctgaat 
tgctctctat 
aaaactaacc 
gtccagggag 
ggaaacactg 
cagaccaggt 
ccagtaagtt 
gccagttgtc 
ttcaccattg 
ccttgaggtt 
aaggccaggg 
acaaccttgt 
gctccttatc 
gatagaatca 
ttttttttta 
tttctaagtg 
aatatgccct 
ctggactgga 
aaattaagtc 
ggacggtggg 
ttcttctaac 
gaacaggcca 
gccagaggga 
gcggctggca 
ttgcccgaca 
tgaggctcta 
cacctttatg 
aagagaattc 
tctgtctctc 
atcagtgctc 
ccaggactgt 
tgtggcagtg 
caatggagtg 
tagacttcag 
ctaagtcatt 
ttccaaagca 
ttgtcaggtc 
ggattctcca 
gctcgggttc 
cggttccctt 
tggctcacgc 
agaagtttga 
agtagccagg 
gaatagcttg 
agcctgggtg 
gctcccagaa 
gttcaccagg 
tgttttaatt 
ttattcaaaa 
ccaatcattt 



tatggaattt 
agtggctgct 
gggaccccag 
ctcacagcag 
agtgtctcag 
tgtcatacac 
gtcattgtta 
ccaagcactt 
ggccaatatg 
catgcctgta 
gcagaggttg 
actctgcctc 
gctccctttg 
ccagcagttg 
actcataaat 
gtgaccacta 
cccgggcccc 
cccagggaga 
catctggctt 
ttccaggatg 
ttgggattgt 
ctcaggcatt 
gctgactctc 
gcttgcgcgc 
ttctcacttc 
actgcattgt 
caggctgcrt 
actttcttct 
gtatgtgaga 
tccggttcta 
atcaaatcta 
attgatgctg 
aagcctgaga 
tcatacaaaa 
aactggactg 
gcagagtggt 
ggtgcagaca 
gaaccatccc 
gttgttgctg 
tattttctta 
ttaggcaaat 
atcctccttt 
agaccctctc 
gcagggccag 
gccgggcacc 
agtttcccaa 
acaactctcc 
gtgaactgtc 
aagactttgt 
tatgtatatt 
gtgtgcacac 
caatggacag 
ttaaaaatgt 
ctgtaatccc 
gaccagcctg 
cgtggtggcg 
aacctgggag 
acaaagcagg 
ggtttgctgg 
gccagtgatt 
tgtgcttcct 
ggccatttat 
aaaggaagaa 



4560 

4620 

4680 

4740 

4800 

4860 

4 920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 

6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 

6780 

6840 

6900 

6960 

7020 

7080 

7140 

7200 

7260 

7320 

7380 

7440 

7500 

7560 

7620 

7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 

8280 
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atcagagttg 
aagtctgagc 
cttcgagttc 
cagccttctt 
gagatttcaa 
tggccttttt 
ttagcaacaa 
tctgatgttt 
ctgattcttt 
taaccttgcg 
catcagaagg 
atttcttttc 
cttaagtatt 
tggtgctgtt 
agatttgttg 
tgttggcttc 
gctggagtgc 
ttctcctgcc 
aatttttgta 
ttgagatgga 
tgcaactttc 
ggattacagg 
tgttggccag 
gtgctaggat 
gtttgactct 
tgcctcccca 
gcccgccacc 
cattggccag 
aagtgctggg 
ggaagggtaa 
cacgagcaga 
gagggaggag 
caccctgtgc 
gcacttgtgc 
tcctaagtat 
gatgtttaga 
tagaatgaaa 
aaatgaagcc 
tacagagtca 
atttagggct 
tttggagcag 
gtctttgtac 
tattccaggg 
gggcagattc 



ctataaaatt 
tgaggttggt 
attttttttc 
gtatttaagg 
gtctcttgtc 
gaaccaagac 
tgtacattaa 
ggagcttgag 
tgctgtatag 
tcacggagct 
ttctcgatgt 
ttgatttgca 
cctttctgat 
tctttgaaag 
tttttctcct 
tttttttttt 
agtgatgtga 
tcagactcct 
ttattattat 
gtctcactct 
cacctcccgg 
catccatcac 
gctggtctca 
tacaggcttg 
tgtcacccat 
gttcaaacaa 
acacctggct 
actggtctca 
attacaggcg 
atggggattt 
gcatgagcag 
casccctcag 
ctggtgtgct 
cctcccattc 
tttggaggaa 
tggcatagtc 
ggatagggta 
cttgaggcag 
gggactggga 
taggtgagga 
aggagtcaca 
tcaggcaggg 
gaggagacgg 
aaggtacatt 



cagtaaaaag 
ctcttgccaa 
attctgccta 
catcgtctta 
accatcctca 
tttgcaaact 
ttttggattt 
tatacagact 
ccttagatgt 
gttagtgaac 
gcatttattc 
aagataatac 
attcaaaatc 
tacttttctc 
gaagcctcct 
tttttttctt 
tctcgggtca 
gagtagctgg 
tattattatt 
gtcgtccagg 
gttcaagcga 
gcctggctga 
agctcctgac 
agccaccgtg 
gctggtgtgc 
ttctctttcc 
aatttttaaa 
aactcctgac 
tgagccaccg 
gaaaagttga 
gggccactgt 
gttaaaagca 
gctgggtgca 
cagtggggaa 
agcagctaaa 
agggaaggcc 
agccatgcaa 
cagtgtcctc 
gcgcaataga 
cttggaattt 
ggggcttatt 
acaagtggaa 
tggctcagac 
ttga 
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ctcatagcca 
accgtggctg 
ttctggcatc 
gactttgtgg 
cacatgacaa 
gatctctccc 
tcattttcat 
gtaaatatag 
gcaatgcaga 
gaggtaaaaa 
ttgcacccct 
ctttctaaga 
gtggttttta 
ccaaagatgt 
tgtatgcctt 
tttgagacag 
ctgcaacctc 
gattacaggc 
attattattt 
ctggagtaca 
ttctccctgc 
ttttattttt 
cttgtgatcc 
cccaacctat 
aatggcacca 
tcagcctccc 
aatattttta 
ctcaggtgat 
tgcctggcca 
acatgtcaat 
cctgggtgtt 
atgatttgtt 
gtaattcagc 
gacaatgttt 
taaaaggatg 
tctcttcgga 
acccctgggc 
agtaggttcc 
agacatcaga 
cattagagtg 
tacattttta 
aaaaggggca 
taaggtagta 



aacggctgtg 
ttgtgtgttg 
agctcacttg 
ctctaaagta 
caaaacccat 
ccgtgaagga 
gttttatttt 
ttcttgtatt 
cactatctaa 
taataaaggt 
tgaaaggtaa 
cagaagtcac 
tataaccaag 
aagtggttat 
ccaataattc 
agtcttgctc 
tgcctcctgg 
gtgcgccacc 
tatttattta 
gtggcatgat 
ctcagcctcc 
agtagagagg 
acccgcctca 
aattattttt 
tctcagctca 
aggtagttgc 
gtagagacgg 
ccgcccacct 
tgttgggttt 
ttaatttaac 
tggggagaar 
ccaaccaata 
agccaacaaa 
aaacaaattt 
aaagagggat 
gatcacattt 
aaagaggtct 
agaaatagca 
aagtagccag 
aatgggaagc 
aaagaccatt 
aaataagatt 
gagtataggt 



ctcagatgga 
ttcttcatgt 
aggagtccct 
cctgtctgtt 
aatgcataag 
gttgagcaca 
gtaaatatta 
tgtactaatt 
ctgtgtgtgg 
acagccagtg 
ttgcacaaag 
gatatcatcg 
aaagctaata 
tttagctaga 
tgtagtgttg 
tgtcacccat 
gttcaagcga 
atgcgtggct 
tttatttatt 
ctcggctcac 
caagtagctg 
ggtttcactg 
gcctcccaaa 
ttgagatgga 
ctgcaacctc 
aattacaggt 
ggtttcagca 
cggcctccca 
ttttgggtgg 
aagccctcat 
cagaagaagg 
tttgagtacc 
gcaagcccat 
acaacatcgg 
gggaacgagg 
gagcagactc 
ccaagtagag 
aggtcgctgg 
ggaccaggat 
cattgaaggc 
ctggaaaaca 
gttaccccag 
gggacgaaat 



8340 
8400 
8460 
8520 
8580 
8640 
8700 
8760 
8820 
8880 
8940 
9000 
9060 
9120 
9180 
9240 
9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10884 



<210> 3 

<211> 10682 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> 5'UTR 
<222> 1..186 



<220> 
<221> CDS 
<222> 187.. 5637 

<220> 

<221> 3'UTR 
<222> 5638.. 10682 



<220> 
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<221> polyA_signal 
<222> 10631. .10636 

<220> 

<221> allele 
<222> 471 

<223> 12-809-119 : polymorphic base G or C 
<220> 

<221> allele 
<222> 5487 

<223> 12-602-350 : polymorphic base A or C 
<220> 

<221> allele 
<222> 6265 

<223> 12-587-379 : polymorphic base A or C 
<220> 

<221> allele 
<222> 7887 

<223> 12-596-124 : polymorphic base A or G 
<220> 

<221> misc_feature 
<222> 2,4,5924,7850 
<223> n=a, g, c or t 

<400> 3 

gngnccgctg ctcgcgctga ggtgcgtccg gtgcccggcc ccccgcgccc ccgcgcgccg 
S IT CCC '? C CgCCCgtCgg tctgcagcgc ggctgaggaa act^S 
attaaf^ gaagatttga "ctttattt ctggactgca tatatatata taacaaggcc 
atg teg gga gec tea gtg aag gtg get gtc egg gta agg cc 
Met Ser Gly Ala Ser Val Lys Val Ala Val Arg Val Arg Pro 
15 10 



60 
120 
180 
228 



ph! t Ct C9a 939 aCC agc aag <* aa tec aaa tgc ate att cag atg 276 

Phe Asn Ser Arg Glu Thr Ser Lys Glu Ser Lys Cys lie lie Gil Mel 

20 25 30 

Gin r?u %° ^ a " 3tt 3aC CCa aa * aat c « "g gaa get 324 

Gin Gly Asn Ser Thr Ser lie lie Asn Pro Lys Asn Pro Lys Glu III 

35 40 45 

cca aag tec ttc agc ttc gac tat tec tac tgg tct cat ace tea ccc w 
Pro Lys Ser Phe Ser Phe Asp Tyr Ser Tyr T^p Ser His T £r Ser Ho 



420 



50 55 60 

gaa gat ccc tgt ttt gca tct caa aac cgt gtg tac aat aac att- ™, 
Glu Asp Pro Cys Phe Ala Ser Gin Asn Arg U Tyr Asn Asp 111 gj 

65 70 75 

aag gaa atg etc tta cac gec ttt gag gga tat aat gtc tgt att ttt dfiR 
Lys Glu Met Leu Leu His Ala Phe Glu Gly Tyr Asn ? al Cys til HI 

80 85 90 

ges tat ggg cag act ggt get gga aaa tct tat aca atg atg ggt aaa 51 c 

Ala Tyr Gly Gin Thr Gly Ala Gly Lys Ser Tyr Thr Me? Me? 2* Lys 

100 105 iio 

Gin r? 3 * 9C ^ 9 9Ct " C atC att cca ca 9 tta tgt gaa gaa ctt 564 
Gin Glu Glu Ser Gin Ala Gly lie lie Pro Gin Leu Cys Glu Glu Leu 

115 120 125 

ttt gag aaa ate aat gac aac tgt aat gaa gaa atg tct tac tct ata 61? 

Phe Glu Lys lie Asn Asp Asn Cys Asn Glu Glu Me? Ser Tyr Ser Val 
130 135 140 

S SJ I £ ill K S S III K E III s; i% S 22 660 

150 155 
aat cca aaa aac aag ggt aat ttg cgt gtg cgt gaa cac cca ctt ctt 708 



WO 00/63375 



PCT/TBOO/00562 



53 

Asn Pro Lys Asn Lys Gly Asn Leu Arg Val Arg Glu His Pro Leu Leu 

160 165 170 

gga ccc tat gtg gag gat ctg tec aag ttg gca gtt act tec tac aca 756 

Gly Pro Tyr Val Glu Asp Leu Ser Lys Leu Ala Val Thr Ser Tyr Thr 
175 180 185 190 

111 ??! ?f ?' C f tC at9 9at 9Ct 999 a3C aaa 9CC ag * ^a gtg gca 804 
Asp lie Ala Asp Leu Met Asp Ala Gly Asn Lys Ala Arg Thr Val Ala 

195 200 205 

Si ^ t tg aat 9aa aCa agt a ^ c c 9t tec cac get gtg ttt acg 852 

Ala Thr Asn Met Asn Glu Thr Ser Ser Arg Ser His kla Val Phe ih? 
210 215 220 

ne vll III ?hr rTn f 39 f 33 ^ C 9at 3at 999 3CC aac ctt tcc act 900 
lie Val Phe Thr Gin Lys Lys His Asp Asn Glu Thr Asn Leu Ser Thr 

225 230 235 

gag aag gtc agt aaa ate age ttg gtg gat eta gca gga agt gaa cga 94 8 

Glu Lys Val Ser Lys lie Ser Leu Val Asp Leu Ala Gly Ser Glu Arg 

240 245 250 9 

S Asd 111 ?hr Cll If IV T C9a " a 339 933 gga gca aat 996 
Ala Asp Ser Thr Gly Ala Lys Gly Thr Arg Leu Lys Glu Gly Ala Asn 

11 * 260 265 270 

att aat aag tct ctt aca act ttg ggc aaa gtc att tea gee ttg gee 104 4 

He Asn Lys Ser Leu Thr Thr Leu Gly Lys Val lie Ser Ala Leu Ala 
275 280 285 

r 3 n f ? aaC ^ C aCt 3gC 339 agt aaa aa * aa 9 aa 9 aaa a « gat 1092 
Glu Val Asp Asn Cys Thr Ser Lys Ser Lys Lys Lys Lys Lys Thr Isp 

290 295 300 

III p CC 9at tCt gta ctt act *W c *c ctt cga gaa aat 1140 

Phe lie Pro Tyr Arg Asp Ser Val Leu Thr Trp Leu Leu Arg llu Asn 
305 3i 0 315 

tta ggt ggc aat tct egg act gca atg gtt get get ctg aqc ccc oco 11 an 
Leu Gly Gly Asn Ser Arg Thr Ala Met Val Ala Ala Leu Ser Pro J2 
320 325 330 

III ?f » aC 9at 939 3Ct ttg agc act ct 9 a 9 a tat gca gat cgt 1236 
Asp lie Asn Tyr Asp Glu Thr Leu Ser Thr Leu Arg Tyr Ala Asp A?g 

340 345 350 

IS III rin ?r f 33 l qC ^ 9Ct 9tt atC aat gag gac ccc aat 9cc 1284 
Ala Lys Gin lie Lys Cys Asn Ala Val lie Asn Glu Asp Pro Asn Ala 

355 360 3 6 5 

aaa ctg gtt cgt gaa tta aag gag gag gtg aca egg ctg aag gac ctt 1332 

Lys Leu Val Arg Glu Leu Lys Glu Glu Val Thr Arg Leu Lys Asp Leu 

370 375 38Q 

ill f 9t 2f IV " C Ct9 993 gat att att gat att 9at cca ttg ate 
Leu Arg Ala Gin Gly Leu Gly Asp lie lie Asp lie Asp Pro Leu He 
385 390 395 

III III t 3C c Ct 2? a agt 993 agc aaa tat ct « aaa 9 at ttt cag aac 
Asp Asp Tyr Ser Gly Ser Gly Ser Lys Tyr Leu Lys Asp Phe Gin Asn 

9UU 405 410 

til f 39 u at 393 t3C tt9 Cta gcc tct aat caa C 9C cct ggc cat 
Asn Lys His Arg Tyr Leu Leu Ala Ser Glu Asn Gin Arg Pro Gly His 

420 425 430 

III c CC !£* 9 f 3 tCC atg 999 tCC Ctc act tca tcc cca tct tcc tgc 
Phe Ser Thr Ala Ser Met Gly Ser Leu Thr Ser Ser Pro Ser Ser Cys 
435 440 445 

til f* C c 9t o 9t C f 9 9tg 9gC ttg acg tct gt « acc a 9t att caa gag 1572 
Ser Leu Ser Ser Gin Val Gly Leu Thr Ser Val Thr Ser lie Gin Glu 
450 455 46Q 

til 111 m! 9 t Ct ll a l Ct ^ 993 939 933 9Ct att * aa c 9t tta aag 1620 
Arg lie Met Ser Thr Pro Gly Gly Glu Glu Ala lie Glu Arg Leu Lys 
465 470 475 * 

n a<3 f 35 atC a " 9Ct gag ttg aat ^ aa act tgg gaa gag aag 1668 
Glu Ser Glu Lys He lie Ala Glu Leu Asn Glu Thr Trp Glu Glu Lys 
480 485 49Q 

ctt cgt aaa aca gag gcc ate aga atg gag aga gag get ttg ttg get 1716 



1380 
1428 
1476 
1524 
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Leu Arg Lys Thr Glu Ala lie Arg Met Glu Arg Glu Ala Leu Leu Ala 

500 505 sin 

r? 9 ?? a 9tt 9CC 3tt C " 9aa gat gga gga acc eta ggg gtt ttc 1764 
Glu Met Gly Val Ala lie Arg Glu Asp Gly gy Thr Leu g? Si 

515 520 525 

tea cct aaa aag acc cca cat ctt gtt aac etc aat gaa gac cca eta 18 i? 
Ser Pro Lys Lys Thr Pro His Leu Val Asn Leu Asn Ilu Asp Pro Leu 

atg tct gag tgc eta ctt tat tac ate aaa qat aaa att til no *n 
Met Ser Glu Cys Leu Leu Tyr Tyr lie Lys £ g y j£ ^ *J 1860 

550 555 

r?^ f? 3 9at 9 ? fc 939 C " cgc cag gac ata 9tg ctg age ggg get 1908 
Gly Gin Ala Asp Ala Glu Arg Arg Gin Asp lie Val Leu sir gy Ma 

565 570 

| - E - a S % £ £ S £ S! S £ £ S 1956 

580 585 5 Q 0 

ggg gaa gtt ate gtg acc tta gag ccc tgt gag cgc tea qaa acc 111 
Gly Glu Val lie Val Thr Leu Glu Pro Cys Glu Arg Ser SI J£ Tyt 
595 600 605 

Ill f 39 399 9tg tCC cag cct gtt ca 9 ctg tea gga aac 
Val Asn Gly Lys Arg Val Ser Gin Pro Val Gin Leu Arg Ser gy Asn 

t ^ 610 615 620 

To III Tit M t9 ?f 333 3aC C3t 9tt ttc cgc «* aa c cae ceg gaa 
Arg lie lie Met Gly Lys Asn His Val Phe Arg Phe Asn His Pro gu 

625 630 635 

caa gca cga get gag cga gag aag act cct tct get gag acc ccc tct 

Gin Ala Arg Ala Glu Arg Glu Lys Thr Pro Ser Ala Glu rTr Pro Ser 



740 745 7 5o 



ttc egg aaa tgg aag tct cat cag ttt act tea tta egg gac tta etc 
Phe Arg Lys Trp Lys Ser His Gin Phe Thr Ser Leu a?? Tsp Leu Leu 



755 760 765 



2004 
2052 
2100 
2148 



64 5 650 

111 Pro £? T C t" ttt: gC ° Cag agg gag ctt ct g 9" aaa caa 2196 
Glu Pro Val Asp Trp Thr Phe Ala Gin Arg Glu Leu Leu Glu Lys Gin 

660 665 67Q 

gv Til So 2? ^ a 999 atg 939 333 agg Cta cag 9 aa atg gag 2244 

Gly He Asp Met Lys Gin Glu Met Glu Lys Arg Leu Gin Glu Met Glu 

680 685 
3 lf t 1*° 339 939 aag gaa gaa gca 9 at ctt ctt ttg gaq caa 22 9? 

lie Leu Tyr Lys Lys Glu Lys Glu Glu Ala Asp Leu Leu Leu gu gn 

690 695 7 0 o 

gn ftrn T Ct9 f C I at 939 a9t 333 ttg cag gcc "g cag aag cag gtt 2340 
Gin Arg Leu Asp Tyr Glu Ser Lys Leu Gin Ala Leu Gin Lys Gin ?.l 

710 7i5 

nf 3 3 u C ° ga tCt Ctg gct gca gaa aca act gaa gag gag gaa oaa aao ma 
Glu Thr Arg Ser Leu Ala Ala Glu Thr Thr llu JlS gu gu SI gu 8 

72 5 730 

gu gn » Ct l" 303 039 C3t gaa tt:t gag gcc caa tgg gcc 2436 

Glu Glu Val Pro Trp Thr Gin His Glu Phe Glu Leu Ala Gin Trp ga 



2484 



S gv IT 1?° f ? t3C Cta 339 gag gCC aat gcc at c agt gtg gaa 2532 
Trp Gly Asn Ala Val Tyr Leu Lys Glu Ala Asn Ala lie Ser Val gu 

0 775 780 

ctg aaa aag aag gtg cag ttt cag ttt gtt ctg ctg act gac aca cta 

Leu Lys Lys Lys Val Gin Phe Gin Phe Val Leu Leu Thr Hp ?hr Su 

85 790 795 

Kr lit IT T tt9 n Ct CCt 933 " a Ctt CCC act gag a tg gaa aaa act 
Tyr Ser Pro Leu Pro Pro Glu Leu Leu Pro Thr Glu Met Glu Lys Thr 
800 805 810 

IT g 9 f° 399 CCt " C CCt C9C aca g tg gta gca gta gaa gtc caa 2676 
His Glu Asp Arg Pro Phe Pro Arg Thr Val Val Ila Val gu Val gn 

" 820 825 oon 

gat ttg aag aat gga gea aea eae tat tgg tet ttg gag aaa ete aag 2724 



2580 
2628 
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2820 
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Asp Leu Lys Asn Gly Ala Thr His Tyr Trp Ser Leu Glu Lys Leu Lys 

835 840 845 

cag agg ctg gat ttg atg cga gag atg tat gat agg gca ggg gag atg 2772 
Gin Arg Leu Asp Leu Met Arg Glu Met Tyr Asp Arg Ala Gly Glu Met 

850 855 860 

gcc tec agt gec caa gac gaa age gaa acc act gtg act ggc age gat 
Ala Ser Ser Ala Gin Asp Glu Ser Glu Thr Thr Val Thr Gly Ser Asp 

865 870 875 

ccc ttc tat gat egg ttc cac tgg ttc aaa ctt gtg ggg age tec ccc 
Pro Phe Tyr Asp Arg Phe His Trp Phe Lys Leu Val Gly Ser Ser Pro 

880 885 890 

att ttc cac ggc tgt gtg aac gag cgc ctt gcc gac cgc aca ccc tec 
lie Phe His Gly Cys Val Asn Glu Arg Leu Ala Asp Arg Thr Pro Ser 
895 900 905 910 

ccc act ttt tec acg gcc gat tec gac ate act gag ctg get gac gag 
Pro Thr Phe Ser Thr Ala Asp Ser Asp lie Thr Glu Leu Ala Asp Glu 

915 920 925 

cag caa gat gag atg gag gat ttt gat gat gag gca ttc gtg gat gac 
Gin Gin Asp Glu Met Glu Asp Phe Asp Asp Glu Ala Phe Val Asp Asp 

930 935 940 

gcc ggc tct gac gca ggg acg gag gag gga tea gat etc ttc agt gac 
Ala Gly Ser Asp Ala Gly Thr Glu Glu Gly Ser Asp Leu Phe Ser Asp 

945 950 . 955 

ggg cat gac ccg ttt tac gac cga tec cct tgg ttc att tta gtg gga 
Gly His Asp Pro Phe Tyr Asp Arg Ser Pro Trp Phe He Leu Val Glv 

960 965 970 

agg gca ttt gtt tac ctg age aat ctg ctg tat ccc gtg ccc ctg ate 
Arg Ala Phe Val Tyr Leu Ser Asn Leu Leu Tyr Pro Val Pro Leu He 
975 980 985 990 

cac agg gtg gcc ate gtc agt gag aaa ggt gaa gtg egg gga ttt ctg 
His Arg Val Ala He Val Ser Glu Lys Gly Glu Val Arg Gly Phe Leu 

"5 1000 1005 

cgt gtg get gta cag gcc ate gca gcg gat gaa gaa get cct gat tat 3252 
Arg Val Ala Val Gin Ala lie Ala Ala Asp Glu Glu Ala Pro Asp Tyr 

101 0 1015 1020 

ggc tct gga att cga cag tea gga aca get aaa ata tct ttt gat aat 
Gly Ser Gly He Arg Gin Ser Gly Thr Ala Lys He Ser Phe Asp Asn 

1025 1030 1035 

gaa tac ttt aat cag agt gac ttt teg tct gtt gca atg act cgt tct 
Glu Tyr Phe Asn Gin Ser Asp Phe Ser Ser Val Ala Met Thr Arg Ser 

1040 1045 1050 

ggt ctg tec ttg gag gag ttg agg att gtg gaa gga cag ggt cag agt 3396 
Gly Leu Ser Leu Glu Glu Leu Arg He Val Glu Gly Gin Gly Gin Ser 
1055 1060 1065 1070 

tct gag gtc ate act cct cca gaa gaa ate agt cga att aat gac ttg 
Ser Glu Val He Thr Pro Pro Glu Glu He Ser Arg He Asn Asp Leu 

1075 1080 1085 

gat ttg aag tea age act ttg ctg gat ggt aag atg gta atg gaa ggg 
Asp Leu Lys Ser Ser Thr Leu Leu Asp Gly Lys Met Val Met Glu Gly 

!090 1095 noo 

ttt tct gaa gag att ggc aac cac ctg aaa ctg ggc agt gcc ttc act 
Phe Ser Glu Glu He Gly Asn His Leu Lys Leu Gly Ser Ala Phe Thr 

H05 mo ins 

ttc cga gta aca gtg ttg cag gcc agt gga ate etc cca gag tat gca 
Phe Arg Val Thr Val Leu Gin Ala Ser Gly lie Leu Pro Glu Tyr Ala 

H20 H25 H30 

gat ate ttc tgt cag ttc aac ttt ttg cat cgc cat gat gaa gca ttc 
Asp He Phe Cys Gin Phe Asn Phe Leu His Arg His Asp Glu Ala Phe 
1135 1140 1145 iiso 

tec acg gag ccc etc aaa aac aat ggc aga gga agt ccc ctg gcc ttt 
Ser Thr Glu Pro Leu Lys Asn Asn Gly Arg Gly Ser Pro Leu Ala Phe 

U55 1160 H65 

tat cat gtg cag aat att gca gtg gag ate act gaa tea ttt gtg gat 
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Tyr His Val Gin Asn He Ala Val Glu He Thr Glu Ser Phe Val Aso 

1170 1175 HBO 

tac ate aaa acc aag cct att gta ttt gaa gtc ttt ggg cat tat cao 3780 

Tyr lie Lys Thr Lys Pro lie Val Phe Glu Val Phe Gly His Tyr Gin 
H85 H90 n95 

r^l r Ctt C3t ° t9 CM 9ga C3g gag ctt aac cc 9 cct cag 3828 

^L Pr ° LSU His Leu Gln G1 y Gln Glu Leu Asn Ser Pro Pro Gin 
I 200 1205 1210 

ccg tgc cgc cga ttc ttc cct cca ccc atg cca ctg tec aag cca gtt 3876 

Pro Cys Arg Arg Phe Phe Pro Pro Pro Met Pro Leu Ser Lys Pro Val 
1215 I 220 1225 1230 

p™ S?° £° f 39 tta aaC aCg atg agc aaa acc a 9 c ctt 99= cag age 3924 
Pro Ala Thr Lys Leu Asn Thr Met Ser Lys Thr Ser Leu Gly Gln Ser 

1235 1240 1245 

atg age aag tat gac etc ctg gtt tgg ttt gag ate agt gaa ctg gag 3972 
Met Ser Lys Tyr Asp Leu Leu Val Trp Phe Glu He Ser Glu Leu Glu 
!250 1255 1260 

d H? 3 939 tat atC cca gct gt< 3 9 tt: 9 ac cac aca gca ggc ttg 4020 

Pro Thr Gly Glu Tyr lie Pro Ala Val Val Asp His Thr Ala Gly III 

!265 1270 1275 

cct tgc cag ggg aca ttt ttg ctt cat cag ggc ate cag cga agg ate 4068 
Pro Cys Gln Gly Thr Phe Leu Leu His Gln Gly He Gln Arg A^g He 
1280 1285 1290 

Thr S? ^ C ^ C C3t 939 aSg ggg agc ga 9 ctc "t tgg aaa gat 4116 

Thr Val Thr He He His Glu Lys Gly Ser Glu Leu His Trp Lys Asp 

f! 5 1300 1305 1310 

K £ 9t f t9 9tg 9ta 9gt Cgt att cgg aat aa 9 cct 939 gtg gat 4164 
Val Arg Glu Leu Val Val Gly Arg He Arg Asn Lys Pro Glu Val Asp 

1315 1320 1325 

If 9at gCC 3tC CtC tcc cta aat a tt att tct gec aag 4212 

Glu Ala Ala Val Asp Ala lie Leu Ser Leu Asn He He Ser Ala Lys 

1330 1335 134Q 

tac ctg aag tct tcc cac aac tct agc agg acc ttc tac cgc ttt gag 4260 

Tyr Leu Lys Ser Ser His Asn Ser Ser Arg Thr Phe Tyr Arg Phe Glu 

1345 1350 i3 55 

If t" 931 a9 ° tCt ° tg Cat aaC tcc ctt c tt ctg aac cga gtg 4 308 

Ala Val Trp Asp Ser Ser Leu His Asn Ser Leu Leu Leu Asn Arg Sal 
!360 i3 6 5 137Q 

l cc l at gg a g aa aa g a tc t ac a t g acc ttg t cg gC c t ac cta gag 4356 

Thr Pro Tyr Gly Glu Lys He Tyr Met Thr Leu Ser Ala Tyr Leu Glu 

" 75 I 380 1385 1390 

fll f at ~ 9 ° 3tC Ca9 CCg 9Ct gtC atc acc aa g 9 at gtg tgc atg 4404 

Leu Asp His Cys lie Gln Pro Ala Val He Thr Lys ksp Val Cys Met 

1395 1400 1405 

£? dk C o° C Cga gat 9CC aag atc tca cca cca c g c tct ctg cgt 4452 
Val Phe Tyr Ser Arg Asp Ala Lys He Ser Pro Pro Arg Ser Leu Arg 

1410 1415 142Q 

age etc ttt ggc agc ggc tac tca aag tca cca gat tcg aat cga etc 4500 
Ser Leu Phe Gly Ser Gly Tyr Ser Lys Ser Pro Asp Ser Asn Arg Val 
1*25 1430 1435 

Thr r? C rf l*° 9 f 3 Ct ° 390 " a t9C 339 atg tCa g ac a ca ggt agt 4548 
?iL 1S Tyr G1 ° LeU Ser Leu ^ Met Ser Asp Thr Gly Ser 
1440 1445 1450 

cca ggt atg cag aga agg aga aga aaa atc tta gat acg tca gtg gca 4596 
Pro Gly Met Gln Arg Arg Arg Arg Lys He Leu Lp Thr Ser 2al E 
J 4 ! 5 1460 1465 1470 

Tvr fJ 1" r? 3 Hf 3 2f 9 aaC tta 9Ca ggc tgg cgg ccc cgt gga gac 4644 
Tyr Val Arg Gly Glu Glu Asn Leu Ala Gly Trp Arg Pro Arg Gly Asp 

1475 1480 1485 

Ser Su lit J 5 ? rt 9 ?° ^ 9 t" 939 Ctg 939 aag Ctg gag ctc c ta 4692 
Ser Leu He Leu Glu His Gln Trp Glu Leu Glu Lys Leu Glu Leu Leu 

1490 1495 1500 

cat gag gtg gaa aaa acc cgc cac ttt ttg ctg ctg cgt gag aga ctt 4740 
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His Glu Val Glu Lys Thr Arg His Phe Leu Leu Leu Arg Glu Arg Leu 
1505 1510 isis 

r?f, l SC c 9C x^ C Z CC 333 tCC Ctg agc gac tc 9 tta tcc ccc age etc 

7 ^?n Ser 116 Pr ° LyS Ser Leu Ser As P Ser Ser Pro Ser Leu 
1520 1525 1530 

age agt ggg ace etc agc ace tec acc agt ate tec tct cag ate tea 
Ser Ser Gly Thr Leu Ser Thr Ser Thr Ser lie Ser Ser Gin lie Ser 
1535 1540 1545 1550 

acc act acc ttt gaa agc gec ate aca cct agc gag agc agt qqc tat 
Thr Thr Thr Phe Glu Ser Ala lie Thr Pro Ser Ilu Ser III G?y Jyr 
1555 1560 i5 6 5 

III Q Ca 2 ga ? ac atc 9aa agc ctg gtg gac cqa g a s aaa g a g ct g get 

Asp Ser Gly Asp lie Glu Ser Leu Val Asp Arg Glu Lys Glu Leu Ila 

1570 1575 i5 8 o 

acc aag tgc ctg caa ctt etc acc cac act ttc aac aga gaa ttc agc 
Thr Lys Cys Leu Gin Leu Leu Thr His Thr Phe Asn Arg Glu Phe Ser 
1585 1590 1595 

r?n u 3C IT t 9 ° 9tC agt gaC tgt aag "9 tct gat atc tct cca 

,tL y Ser ** P CyS Lys Leu Ser As P ™* Ser Pro 

1600 i 6 05 1610 

r?H t" n C ° tCt 939 tCC agt ttc agc agt 9 CC acc «=tc act 
lie Gly Arg Asp Pro Ser Glu Ser Ser Phe Ser Ser Ala Thr Leu Thr 

\ 1620 1625 1630 

Pr C T C c CC tu C tgt CCC tCt Ctg gta gac tct a 9S a 9= aac tct ctg 
Pro Ser Ser Thr Cys Pro Ser Leu Val Asp Ser Arg Ser Asn Ser Leu 

I 635 1640 1645 

III rf 9 f 39 !f C CCa gaa gCC aat tcc cgg 9 CC tct a 9t ccc tgc cca 5172 
Asp Gin Lys Thr Pro Glu Ala Asn Ser Arg Ala Ser Ser Pro Cys Pro 
1650 1655 i 6 60 

Gin III r 33 r 39 L" C f 9 a " gtC CCa gct gtg gaa aca tat ttg 5220 

Glu Phe Glu Gin Phe Gin lie Val Pro Ala Val Glu Thr Pro Tyr Leu 

1665 1670 1675 

gee cga gca gga aaa aac gaa ttt etc aat ctt gtt cca gat att aaa 5268 

A ^n Ala LyS ASn G1U PhS Leu Asn Leu ^al Pro Asp lie £! 

1680 1685 1690 

Glu ztt lln t° a c 9C c" 9tg 9tC tCt 339 aaa gga tac ctt «t ttc 
Glu He Arg Pro Ser Ser Val Val Ser Lys Lys Gly Tyr Leu His Phe 

1695 1 7 00 1705 1710 

ill rt 9 o Ct T Ctt ta ° agt aaC tgg gct aaa cat ttt gtt gtc gtc cgt 
Lys Glu Pro Leu Tyr Ser Asn Trp Ala Lys His Phe Val Val Val Arg 

1715 1720 1725 

egg cct tat gtc ttc atc tat aac agt gac aaa gac cct gtg gag cgt 5412 
Arg Pro Tyr Val Phe lie Tyr Asn Ser Asp Lys Asp Pro ?al Glu 2g 
1730 1735 1740 

III i\t ?f f tg c CC 3Ca 9Ca Cag gtg gag tac agt ga 5 cag 
Gly lie lie Asn Leu Ser Thr Ala Gin Val Glu Tyr Ser Glu Asp Gin 

1745 1750 1755 

cag gee atg gtg aag aca cca aac acm ttt gct gtc tgc aca aag cac 

Gin Ala Met Val Lys Thr Pro Asn Thr Phe Ala Val Cys Thr Lys His 

1760 176 5 1770 

til r 99 ,f? , Ctt ttg Cag 9CC CtC aat gac aaa a ac atg aac gac tgg 5556 
Arg Gly Val Leu Leu Gin Ala Leu Asn Asp Lys Asp Met Asn Asp T?p 

IV 5 . «. 1780 1785 P 17? 0 

ttg tat gec ttc aac cca ctt eta gct ggc aca ata egg tea aag ctt 
Leu Tyr Ala Phe Asn Pro Leu Leu Ala Gly Thr He Arg Ser Lys Leu 

1795 1800 1805 

tcc cgc aga tgc ccg agc cag teg aaa tac taagtgactc tgccgagtgc 5654 
Ser Arg Arg Cys Pro Ser Gin Ser Lys Tyr 

1810 1815 
™ aCtC9C c " cgagaga taaagaaagc gttacctctc atttctcttt gtgattcttg 5714 
aeggtgaetc ttgtatgtaa tcctgtggct taactaettc tccetecttg ?ccagcact? 5774 
cccgttcecc atetecattg etctgtacte ttttetttt! tcttgtgctg 5834 
agaatctcgt tagtagcatg tggectaaca aaaggaaaaa atgtttttaa acacacacac 5894 
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acacacacac 
gtgaatctcc 
atgaccacct 
actttttttt 
actaagggag 
cccctctgtc 
gtgagcatag 
ttatggaatt 
cagtggctgc 
ggggacccca 
cctcacagca 
gagtgtctca 
atgtcataca 
ggtcattgtt 
tccaagcact 
tggccaatat 
acatgcctgt 
ggcagaggtt 
gactctgcct 
acagctccct 
aatccagcag 
tatactcata 
aaccgtgacc 
ggagcccggg 
actgcccagg 
aggtcatctg 
agttttccag 
tgtcttggga 
attgctcagg 
ggttgctgac 
aggggcttgc 
ttgtttctca 
tatcactgca 
atcacaggct 
tttaactttc 
agtggtatgt 
cccttccggt 
tggaatcaaa 
agtcattgat 
tgggaagcct 
taactcatac 
gccaaactgg 
gggagcagag 
ggcaggtgca 
gacagaacca 
tctagttgtt 
tatgtatttt 
attcttaggc 
tctcatcctc 
gctcagaccc 
ctgtgcaggg 
agtggccggg 
agtgagtttc 
tcagacaact 
cattgtgaac 
agcaaagact 
ggtctatgta 
tccagtgtgc 
gttccaatgg 
ccttttaaaa 
acgcctgtaa 
ttgagaccag 
caggcgtggt 



acacacacac 
aaattattgt 
ctgatgtctg 
tcttttctct 
gagacatctg 
tctttttgcc 
mactgcaaca 
tttatttact 
tcttgacttt 
gggtttggcc 
ggattcctgc 
gttctgcttt 
caaatgttcc 
atgattaaga 
tggggaggcc 
gatgaaaccc 
aattccagct 
gcagtgagcc 
cagaaaaaaa 
ttgatcaaag 
ttgttttcat 
aataagcaaa 
actactccaa 
ccccaggctt 
gagaaaggag 
gcttccgaga 
gatgagaggg 
ttgttttaca 
cattcaggaa 
tctcaggcgt 
gcgcctctgc 
cttcctggtt 
ttgtgaagag 
gcrtctgcac 
ttcttcctgt 
gagattttct 
tctattcaaa 
tctaatttca 
gctgctgtta 
gagagagagt 
aaaactggtt 
actgtctgtt 
tggtcgtgtc 
gacagccttt 
tccccactgt 
gctgttgtgg 
cttaaagcac 
aaattcagag 
ctttctccca 
tctctgtgtc 
ccagccagcc 
cacctgagcc 
ccaactaaga 
ctcctatttt 
tgtctcttga 
ttgttgcctg 
tatttatcca 
acactcatcg 
acagtatcag 
atgtgattgt 
tcccagcaca 
cctggccaac 
ggcgtgtgcc 



acatacacan 
gggtgtactt 
tgctgctgtc 
gtttgtgata 
gcctttttag 
atggctaatc 
gttatattct 
ggtagagagg 
gtgtgaaggg 
gtgggccgtg 
ccactgtttt 
tggcattagt 
acaaggcggg 
taatgccggg 
gaggcgggca 
cgtctctact 
gcttgggagc 
aagatcgcgc 
aaaaaaaata 



aaatatagct 
tgatgcttgt 
atgtggcagg 
acaaaacaca 
tgttgcgtgt 
gaagctcact 
tcatcagaga 
gaaaaagaaa 
ccatccttta 
agtatctgct 
taggcagctg 
agagttgttg 
gggctcatct 
gaggaaaagt 
ctgaaaagtg 
taaaacatag 
aatgtagtta 
ccagcaggat 
gggaaatgaa 
cagagtgtga 
gaaattatgt 
tggaaagtct 
catagcgtgc 
ctgcgtgctc 
gctggtcccc 
gaggctgtga 
cgggaaagtt 
gcctttaaat 
aaataagtgc 
tcaaagcaaa 
tgtgtgccct 
catgcgctag 
ccagcttgtt 
aaccactatt 
ttatggattt 
gatctaaaaa 
ctgcttattg 
gctatactta 
gtactctttc 
ggcttgtttg 
taacctgcct 
ttgggaggcc 
atggtgaaac 
tgtagtccca 



acaaaaacac 
tggcttcctt 
accaggcacc 
tcactttaat 
aacctgagag 
cctgcatttc 
gagtcaaagt 
agacgagagg 
aaatgccaag 
atggcagcag 
ttctctgttg 
gatggtggtg 
agtgtttcac 
caggccgggc 
gatcacgaga 
aaaaatacaa 
ctgaggcagg 
tattgcactc 
ataatgctgg 
ttcaggcata 
caggttgaag 
ctttgctttc 
atatgcctag 
tccctgctcc 
gtggacagtc 
agataagtct 
gcctccagtg 
cttcccttgc 
cactcccact 
gatgacttcc 
ctagggagac 
ctgaagaaca 
gaatcncgga 
acccgcggaa 
gtcactaact 
gaagtttcat 
ctgtcggtgc 
gatggaattt 
cagaggatcc 
gatacactga 
ttgctttgga 
ctgaataaga 
ttcaccctct 
agcacgtcca 
gagatttgtg 
aagaaacata 
aagcaaaaac 
tagttaatac 
atatggcctc 
cctgggagtc 
tcaggagcac 
gttaaacgtg 
atatattttt 
ttcagctcat 
caagatgact 
tctaatttac 
cttgcacagt 
tgcatttccc 
acttaggtct 
cttgaaagat 
aaggcaggtg 
cccgtctcta 
gctacttggg 



aaaaactctg 
ttgtatgata 
tttgtttttc 
ttttcttggg 
gaaaaaaaga 
cattcaggga 
tggggctttt 
ctttttcagt 
gatgcttctg 
gcggtgggat 
ggagggaagc 
gtacagttgg 
tttctggtga 
acagtggctc 
tcaggagttc 
aattagtcgg 
agaactgctt 
cagcctgggt 
gtagtgacct 
aacctggaag 
atgctttcag 
tggatcccag 
gggcacggat 
tctccatctg 
ttctttcctt 
gtctctttca 
acttcagttg 
tcagacctct 
tggtgagtcc 
cgcttcatgc 
ttgtgtcatc 
ggtctcccag 
gagagaaagg 
actctatggc 
gtgatgttat 
tgtctgatgg 
ttagagatgg 
gaaggtcact 
atgtctgtga 
aatgactttt 
agcgtcagac 
aggcctctta 
ggggcgcccc 
gggtgggtgc 
gcaggaactg 
gcccttaagg 
tttaaaaggc 
taatcacctc 
accaccagcc 
agtcagcgct 
aggcaagggg 
ctgacggcaa 
tcccttcagt 
ttcagatgaa 
tttcctggca 
agggatattt 
ggattggaga 
tcgtgctgtg 
ttcagttttc 
tcaaccgggt 
gatcacctga 
ctaaaaatac 
aggctgaggc 



aggggatctg 
ggtccccatc 
aagacaacat 
tggcttagag 
gtcttttttt 
aaaggtggta 
tacggcataa 
gggcctggga 
gtggacttca 
gcttgtagct 
tcttttctag 
aattagtgcc 
taaacttgat 
acgcctgtaa 
aagaccagcc 
gtatggtggc 
gaacccagga 
gacagagcaa 
tgtgattgtt 
tctccctctg 
tgatgctctc 
gattaaaact 
gaacgtccag 
gtgtggaaac 
ctgacagacc 
gctgccagta 
ctttgccagt 
ctgtttcacc 
tcggccttga 
agcaaaggcc 
atccacaacc 
cttcgctcct 
aaaggataga 
ggattttttt 
ttgttttcta 
acacaatatg 
ctgcctggac 
tttaaaatta 
cacaggacgg 
gtttttcttc 
attagaacag 
gggagccaga 
tgctgcggct 
tcccttgccc 
tttatgaggc 
aaaccacctt 
aggaaagaga 
ctcctctgtc 
ccaaatcagt 
caggccagga 
tgcttgtggc 
ggggcaatgg 
cacatagact 
ggaactaagt 
catattccaa 
aattttgtca 
gaaaggattc 
tcccgctcgg 
ctttcggttc 
gtggtggctc 
ggtcaggagt 
aaaaagtagc 
aggagaatag 



5954 

6014 

6074 

6134 

6194 

6254 

6314 

6374 

6434 

6494 

6554 

6614 

6674 

6734 

6794 

6854 

6914 

6974 

7034 

7094 

7154 

7214 

7274 

7334 

7394 

7454 

7514 

7574 

7634 

7694 

7754 

7814 

7874 

7934 

7994 

8054 

8114 

8174 

8234 

8294 

8354 

8414 

8474 

8534 

8594 

8654 

8714 

8774 

8834 

8894 

8954 

9014 

9074 

9134 

9194 

9254 

9314 

9374 

9434 

9494 

9554 

9614 

9674 
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cttgaacctg ggaggtggag gctgcagtga gttgagactg caccattgca ctccagcctg 9TS4 

ggtgacaaag caggactccg tctcaaaaaa aaagaaaaga ttcatgatgc tgctgctccc 9794 

agaaggtttg ctggatgtgt ttacatagga ctctaacttg tgtgcactac agttgttcac 9854 

cagggccagt gattcacccc agtgtgtggc cagaccatga ctgtgtagca ggaatgtttt 9914 

aatttgtgct tccttagtaa attgaaatat cagctgagag attatttgct gctgttattc 9974 

aaaaggccat ttatgaagtt agtatttgag ccccataaga tctttaaaaa gcctccaatc 10034 

atttaaagga agaaatcaga gttgctataa aattcagtaa aaagctcata gccaaaacgg 10094 

ctgtgctcag atggaaagtc tgagctgagg ttggtctctt gccaaaccgt ggctgttgtg 10154 

tgttgttctt catgtcttcg agttcatttt ttttcattct gcctattctg gcatcagctc 10214 

acttgaggag tccctcagcc ttcttgtatt taaggcatcg tcttagactt tgtggctcta 10274 

aagtacctgt ctgttgagat ttcaagtctc ttgtcaccat cctcacacat gacaacaaaa 10334 

cccataatgc ataagtggcc tttttgaacc aagactttgc aaactgatct ctcccccgtg 10394 

aaggagttga gcacattagc aacaatgtac attaattttg gattttcatt ttcatgtttt 10454 

attttgtaaa tattatctga tgtttggagc ttgagtatac agactgtaaa tatagttctt 10514 

gtatttgtac taattctgat tcttttgctg tatagcctta gatgtgcaat gcagacacta 10574 

tctaactgtg tgtggtaacc ttgcgtcacg gagctgttag tgaacgaggt aaaaataata 10634 

aaggtacagc cagtgcatca aaaaaaaaaa aaaaaaaaaa aaaaaaaa 10682 

<210> 4 

<211> 1816 

<212> PRT 

<213> Homo sapiens 

<400> 4 

Met Ser Gly Ala Ser Val Lys Val Ala Val Arg Val Arg Pro Phe Asn 

5 10 15 

Ser Arg Glu Thr Ser Lys Glu Ser Lys Cys He He Gin Met Gin Gly 

20 25 30 

Asn Ser Thr Ser lie lie Asn Pro Lys Asn Pro Lys Glu Ala Pro Lys 

35 40 45 

Ser Phe Ser Phe Asp Tyr Ser Tyr Trp Ser His Thr Ser Pro Glu Asp 

50 55 60 

Pro Cys Phe Ala Ser Gin Asn Arg Val Tyr Asn Asp He Gly Lys Glu 

70 75 80 

Met Leu Leu His Ala Phe Glu Gly Tyr Asn Val Cys He Phe Ala Tyr 

85 90 95 

Gly Gin Thr Gly Ala Gly Lys Ser Tyr Thr Met Met Gly Lys Gin Glu 

100 105 no 

Glu Ser Gin Ala Gly He He Pro Gin Leu Cys Glu Glu Leu Phe Glu 

115 120 125 

Lys He Asn Asp Asn Cys Asn Glu Glu Met Ser Tyr Ser Val Glu Val 

130 135 140 

Ser Tyr Met Glu He Tyr Cys Glu Arg Val Arg Asp Leu Leu Asn Pro 
145 150 155 160 

Lys Asn Lys Gly Asn Leu Arg Val Arg Glu His Pro Leu Leu Gly Pro 

165 170 175 

Tyr Val Glu Asp Leu Ser Lys Leu Ala Val Thr Ser Tyr Thr Asp He 

180 185 190 

Ala Asp Leu Met Asp Ala Gly Asn Lys Ala Arg Thr Val Ala Ala Thr 

195 200 205 

Asn Met Asn Glu Thr Ser Ser Arg Ser His Ala Val Phe Thr He Val 

210 215 220 

Phe Thr Gin Lys Lys His Asp Asn Glu Thr Asn Leu Ser Thr Glu Lvs 
225 23° 235 240 

Val Ser Lys He Ser Leu Val Asp Leu Ala Gly Ser Glu Arg Ala Asp 

245 250 255 

Ser Thr Gly Ala Lys Gly Thr Arg Leu Lys Glu Gly Ala Asn He Asn 

260 265 270 

Lys Ser Leu Thr Thr Leu Gly Lys Val He Ser Ala Leu Ala Glu Val 

2? 5 280 285 

Asp Asn Cys Thr Ser Lys Ser Lys Lys Lys Lys Lys Thr Asp Phe He 
_ 290 295 300 

Pro Tyr Arg Asp Ser Val Leu Thr Trp Leu Leu Arg Glu Asn Leu Gly 
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3 ? 5 310 315 320 

Gly Asn Ser Arg Thr Ala Met Val Ala Ala Leu Ser Pro Ala Asp He 

325 330 335 

Asn Tyr Asp Glu Thr Leu S r Thr Leu Arg Tyr Ala Asp Arg Ala Lys 

340 345 350 

Gin lie Lys Cys Asn Ala Val lie Asn Glu Asp Pro Asn Ala Lys Leu 

355 360 365 

Val Arg Glu Leu Lys Glu Glu Val Thr Arg Leu Lys Asp Leu Leu Arg 

J/0 375 380 

Ala Gin Gly Leu Gly Asp lie lie Asp lie Asp Pro Leu lie Asp Asp 

I e 390 395 4 00 

Tyr Ser Gly Ser Gly Ser Lys Tyr Leu Lys Asp Phe Gin Asn Asn Lys 

405 41 ° 415 

His Arg Tyr Leu Leu Ala Ser Glu Asn Gin Arg Pro Gly His Phe Ser 

420 425 430 

Thr Ala Ser Met Gly Ser Leu Thr Ser Ser Pro Ser Ser Cys Ser Leu 

435 440 445 

Ser Ser Gin Val Gly Leu Thr Ser Val Thr Ser lie Gin Glu Arg lie 

0 455 460 

Met Ser Thr Pro Gly Gly Glu Glu Ala lie Glu Arg Leu Lys Glu Ser 

^ 0 • 475 480 

Glu Lys lie He Ala Glu Leu Asn Glu Thr Trp Glu Glu Lys Leu Arg 

485 490 495 

Lys Thr Glu Ala He Arg Met Glu Arg Glu Ala Leu Leu Ala Glu Met 

500 505 510 

Gly Val Ala He Arg Glu Asp Gly Gly Thr Leu Gly Val Phe Ser Pro 

515 520 525 

Lys Lys Thr Pro His Leu Val Asn Leu Asn Glu Asp Pro Leu Met Ser 

535 540 
Glu Cys Leu Leu Tyr Tyr He Lys Asp Gly lie Thr Arg Val Gly Gin 

», 550 555 560 

Ala Asp Ala Glu Arg Arg Gin Asp He Val Leu Ser Gly Ala His He 

565 570 575 

Lys Glu Glu His Cys He Phe Arg Ser Glu Arg Ser Asn Ser Gly Glu 

580 585 5go 

Val He Val Thr Leu Glu Pro Cys Glu Arg Ser Glu Thr Tyr Val Asn 

595 600 605 

Gly Lys Arg Val Ser Gin Pro Val Gin Leu Arg Ser Gly Asn Arg He 

610 615 620 

He Met Gly Lys Asn His Val Phe Arg Phe Asn His Pro Glu Gin Ala 

I 630 635 640 

Arg Ala Glu Arg Glu Lys Thr Pro Ser Ala Glu Thr Pro Ser Glu Pro 

645 650 655 

Val Asp Trp Thr Phe Ala Gin Arg Glu Leu Leu Glu Lys Gin Gly He 

660 665 670 

Asp Met Lys Gin Glu Met Glu Lys Arg Leu Gin Glu Met Glu He Leu 

675 680 685 

Tyr Lys Lys Glu Lys Glu Glu Ala Asp Leu Leu Leu Glu Gin Gin Arg 

690 695 700 

Leu Asp Tyr Glu Ser Lys Leu Gin Ala Leu Gin Lys Gin Val Glu Thr 
I 710 715 720 

Arg Ser Leu Ala Ala Glu Thr Thr Glu Glu Glu Glu Glu Glu Glu Glu 

725 730 735 

Val Pro Trp Thr Gin His Glu Phe Glu Leu Ala Gin Trp Ala Phe Arq 

740 745 750 

Lys Trp Lys Ser His Gin Phe Thr Ser Leu Arg Asp Leu Leu Trp Gly 
755 760 7 6 5 y 

Asn Ala Val Tyr Leu Lys Glu Ala Asn Ala He Ser Val Glu Leu Lys 

770 775 780 

Lys Lys Val Gin Phe Gin Phe Val Leu Leu Thr Asp Thr Leu Tyr Ser 

790 795 800 

Pro Leu Pro Pro Glu Leu Leu Pro Thr Glu Met Glu Lys Thr His Glu 
805 810 815 
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Asp Arg Pro Phe Pro Arg Thr Val Val Ala Val Glu Val Gin Asp Leu 

t ^25 gon 

Lys Asn Gly Ala Thr His Tyr Trp Ser Leu Glu Lys Leu Lys Gin Arg 
835 840 845 * 

Leu Asp Leu Met Arg Glu Met Tyr Asp Arg Ala Gly Glu Met Ala Ser 

855 860 
Ser Ala Gin Asp Glu Ser Glu Thr Thr Val Thr Gly Ser Asp Pro Phe 

Tyr Asp Arg Phe His Trp Phe Lys Leu Val Gly Ser Ser Pro lie III 

890 B95 
His Gly Cys Val Asn Glu Arg Leu Ala Asp Arg Thr Pro Ser Pro Thr 

900 905 gin 

Phe Ser Thr Ala Asp Ser Asp lie Thr Glu Leu Ala Asp Glu Gin Gin 

5 920 925 

Asp Glu Met Glu Asp Phe Asp Asp Glu Ala Phe Val Asp Asp Ala Gly 

935 940 
Ser Asp Ala Gly Thr Glu Glu Gly Ser Asp Leu Phe Ser Asp Gly His 

Asp Pro Phe Tyr Asp Arg Ser Pro Trp Phe III Leu Val Gly Arg 111 

970 975 
Phe Val Tyr Leu Ser Asn Leu Leu Tyr Pro Val Pro Leu lie His Arg 

980 985 ogn 

Val Ala lie Val Ser Glu Lys Gly Glu Val Arg Gly Phe Leu Arg Val 

1000 1005 
Ala Val Gin Ala lie Ala Ala Asp Glu Glu Ala Pro Asp Tyr Gly Ser 

1015 1020 
GlyHe Arg Gin Ser Glyjhr Ala Lys He Serine Asp Asn Glu Tyr 

Phe Asn Gin Ser Asp Phe Ser Ser Val Ala Me^Thr Arg Ser Gly IVu 
o T 1045 !050 1055 

Ser Leu Glu Glu Leu Arg He Val Glu Gly Gin Gly Gin Ser Ser Glu 

1060 1065 1070 

Val lie Thr Pro Pro Glu Glu lie Ser Arg He Asn Asp Leu Asp Leu 
1075 1080 108 5 

?090 SSr ^ ^ Jo?5 Gly ^ GiU Gly PhS Ser 

Glu Glu lie Gly Asn His Leu Lys Leu Gly Ser HT Phe Thr Phe Arg 

1110 1115 i i9n 

Val Thr Val Leu Gin Ala Ser Gly lie Leu Pro Glu Tyr Ala Asp lie 
nu ~ 1125 H30 H3S 

Phe Cys Gin Phe Asn Phe Leu His Arg His Asp Glu Ala Phe Ser Thr 
„ I 140 1145 i 150 

Glu Pro Leu Lys Asn Asn Gly Arg Gly Ser Pro Leu Ala Phe Tyr His 

1155 H60 n 65 

Val Gin Asn lie Ala Val Glu lie Thr Glu Ser Phe Val Asp Tyr lie 

1175 1180 
Lys Thr Lys Pro lie Val Phe Glu Val Phe Gly His Tyr Gin Gin His 

1190 1195 
Pro Leu His Leu Gin Gly Gin Glu Leu Asn Ser Pro Pro Gin Pro Cvs° 

n » 1205 1210 1215 

Arg Arg Phe Phe Pro Pro Pro Met Pro Leu Ser Lys Pro Val Pro Ala 

„v 20 1225 1230 

Thr Lys Leu Asn Thr Met Ser Lys Thr Ser Leu Gly Gin Ser Met Ser 

1235 1240 1245 

Lys Tyr Asp Leu Leu Val Trp Phe Glu lie Ser Glu Leu Glu Pro Thr 

1255 1260 
Gly Glu Tyr lie Pro Ala Val Val Asp His Thr Ala Gly Leu Pro Cys 
±£,vo 1270 1275 

Gin Gly Thr Phe Leu Leu His Gin Gly lie Gin Arg Arg lie Thr 111° 

1285 1290 1295 

Thr lie lie His Glu Lys Gly Ser Glu Leu His Trp Lys Asp Val Arg 
„ 1300 1305 i 310 

Glu Leu Val Val Gly Arg lie Arg Asn Lys Pro Glu Val Asp Glu Ala 
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1315 1320 1325 

Ala Val Asp Ala lie Leu Ser Leu Asn lie lie Ser Ala Lys Tyr Leu 

* „ . i335 1340 

Lys Ser Ser His Asn Ser Ser Arg Thr Phe Tyr Arg Phe Glu Ala Val 

• 1350 1355 1360 

Trp Asp Ser Ser Leu His Asn Ser Leu Leu Leu Asn Arg Val Thr Pro 

1365 1370 1375 

Tyr Gly Glu Lys lie Tyr Met Thr Leu Ser Ala Tyr Leu Glu Leu Asp 

1380 i3 8 5 1390 

His Cys lie Gin Pro Ala Val He Thr Lys Asp Val Cys Met Val Phe 

1395 1400 1405 

Sfn^ ASP Ma LyS 116 Ser Pr ° Pro Ser Leu Ar9 Ser Leu 

10 1415 1420 

Phe Gly Ser Gly Tyr Ser Lys Ser Pro Asp Ser Asn Arg Val Thr Gly 

r, \ „, 1430 14 35 i 4 Jo 

He Tyr Glu Leu Ser Leu Cys Lys Met Ser Asp Thr Gly Ser Pro Gly 

1445 1450 1455 

Met Gin Arg Arg Arg Arg Lys lie Leu Asp Thr Ser Val Ala Tyr Val 

1460 1465 1470 

Arg Gly Glu Glu Asn Leu Ala Gly Trp Arg Pro Arg Gly Asp Ser Leu 
i47 5 1480 i4 8 5 

116 i!on G1U Gln TrP G1U LeU Glu Leu Giu L ^ His Glu 

1490 1495 1500 

Val Glu Lys Thr Arg His Phe Leu Leu Leu Arg Glu Arg Leu Gly Asp 

1^10 1515 1 ^9fi 

Ser lie Pro Lys Ser Leu Ser Asp Ser Leu Ser Pro Ser Leu Ser Ser 
^ ^ 1525 1530 1535 

Gly Thr Leu Ser Thr Ser Thr Ser lie Ser Ser Gin lie Ser Thr Thr 

1540 1545 1550 

Thr Phe Glu Ser Ala lie Thr Pro Ser Glu Ser Ser Gly Tyr Asp Ser 

1555 1560 i5 6 5 

Gly Asp lie Glu Ser Leu Val Asp Arg Glu Lys Glu Leu Ala Thr Lys 
1570 1575 1580 y 

Cys Leu Gin Leu Leu Thr His Thr Phe Asn Arg Glu Phe Ser Gin Val 

1590 1595 lg00 

His Gly Ser Val Ser Asp Cys Lys Leu Ser Asp lie Ser Pro He Gly 

1605 1610 i 6 i5 

Arg Asp Pro Ser Glu Ser Ser Phe Ser Ser Ala Thr Leu Thr Pro Ser 

1620 1625 1630 

Ser Thr Cys Pro Ser Leu Val Asp Ser Arg Ser Asn Ser Leu Asp Gin 

1635 1640 1645 

Lys Thr Pro Glu Ala Asn Ser Arg Ala Ser Ser Pro Cys Pro Glu Phe 

1650 1655 i 66 o 

Glu Gin Phe Gin lie Val Pro Ala Val Glu Thr Pro Tyr Leu Ala Arg 
1665 1670 1675 1680 

Ala Gly Lys Asn Glu Phe Leu Asn Leu Val Pro Asp He Glu Glu He 

1685 1690 1695 

Arg Pro Ser Ser Val Val Ser Lys Lys Gly Tyr Leu His Phe Lys Glu 

1700 1705 1710 

Pro Leu Tyr Ser Asn Trp Ala Lys His Phe Val Val Val Arg Arg Pro 
1715 1720 1725 

Tyr ™ Phe 116 Tyr Asn Ser As P Asp Pro Val Glu Arg Gly He 

1730 1735 1740 

He Asn Leu Ser Thr Ala Gin Val Glu Tyr Ser Glu Asp Gin Gin Ala 

1750 1755 1760 

Met Val Lys Thr Pro Asn Thr Phe Ala Val Cys Thr Lys His Arg Gly 

1765 1770 1775 

Val L u Leu Gin Ala Leu Asn Asp Lys Asp Met Asn Asp Trp Leu Tyr 

1780 i7 8 5 1?90 

Ala Phe Asn Pro Leu Leu Ala Gly Thr He Arg Ser Lys Leu Ser Arg 

1795 1800 1805 

Arg Cys Pro Ser Gin Ser Lys Tyr 
1810 i 815 
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<210> 5 

<211> 357 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 178 

<223> 10-265-178 : polymorphic base A or G 
<220> 

<221> misc_binding 
<222> 166.. 190 
<223> 10-265-178. probe 

<220> 

<221> primer_bind 

<222> 159. .177 

<223> 10-265-178. mis 

<220> 

<221> primer_bind. 
<222> 179.. 197 

<223> 10-265-178. mis complement 
<220> 

<221> primer_bind 
<222> 1..18 
<223> 10-265. pu 

<220> 

<221> primer Jbind 
<222> 338. .357 
<223> 10-265. rp complement 

<400> 5 

ccaaatgtgg aaggaccgga cccctgggtt 

tttgttctgt ttctgcctct ctagagctga 

ccagaactta attctgaaca tgaatgacca 

gtcttctctc tggttcccgg gcgctttagc 

ggtctcctgg ccgtgctttg ctaatgtgct 

aaggagaagc accctgtagg cgtgggcggc 

<210> 6 

<211> 420 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 
<222> 203 

<223> 10-266-203 : polymorphic base C or T 
<220> 

<221> miscjDinding 

<222> 191.. 215 

<223> 10-266-203. probe 

<220> 

<221> prime r_bind 
<222> 184. .202 
<223> 10-266-203. mis 



gcagcgcgtc gagcggtgct gactctttcc 60 

catcgcgctg atcggattgg ccgtcatggg 120 

cggctttgtg gtaagcggcg tgggcgcrtt 180 

cgaggccggc gataggtttg ggagcttacg 24 0 

ctgttgctgc tcgtggcatt tttgtatgga 300 

cgatcccgaa cttagtcctg cggagtg 357 
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<220> 

<221> primer_bind 
<222> 204. .222 

<223> 10-266-203. mis complement 
<220> 

<221> primer_bind 
<222> 1..20 
<223> 10-266. pu 

<220> 

<221> prime r_bind 
<222> 401. .420 
<223> 10-266. rp complement 

<400> 6 

ggtgacataa ctttacagtg aagctcccac 
tgacctcccc agaacttggg ttgaatggaa 
agtcttgtgt gtcctggatc tcctactcag 
ataggactgt ctccaaagtt gaygatttct 
tgggtgccca gtccctgaaa gagatggtct 
tcctggtgaa ggctgggcaa gctgtggatg 
tgctctcagc tgctaccacg atagcagctg 

<210> 7 
<211> 465 
<212> DNA 
<213> Homo sapiens 

<220> 

<221> allele 
<222> 118 

<223> 12-592-118 : polymorphic base A 
<220> 

<221> raise Jbinding 
<222> 106.. 130 
<223> 12-592-118. probe 

<220> 

<221> primer_bind 
<222> 99.. 117 
<223> 12-592-118. mis 



agctggggga agaaaaggca aaggcaggtc 60 

aggtcattgt tactttggcc acagtctgaa 120 

gacttttgtc cttctaggtc tgtgctttta 180 

tggccaatga ggcaaaggga accaaagtgg 240 

ccaagctgaa gaagccccgg cggatcatcc 300 

atttcatcga gaaattggtg aggccagctg 360 

tttttggttt cttcctttag ttctccttct 420 



<220> 

<221> primerjbind 
<222> 119.. 137 

<223> 12-592-118. mis complement 
<220> 

<221> primerjbind 
<222> 1. .19 
<223> 12-592. pu 

<220> 

<221> primer Jbind 

<222> 448. .465 

<223> 12-592. rp complement 

<400> 7 



gagttcagtt tgagttcagc tacgagaatt atttagttga gtgagtgtca gagctqoaat 
tttcaaatct gcccttactg gtatgttgct ttacacaccc tcttacgtaa atcalacwag 



60 
120 
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gattccaccc tgggaggttt gcatagaggg ctgttcttgt agaacttgtg ctcatgcttt 
Ttallt^ """f •Wgcaagcc agaaagtttt tctggtgga? aataa?gtgg 
tcaS" C ttaa9 ^ t,: taagccaagc acttgagttt ctaacalcta aaaagcLag 

f C acagctctag cgcgccctgg cttgattctg ttcatcccca gggggagac? 360 
tgcctttgtt ccagtcctgc tctcccaagc cagcttactg tagttttcca gcaattctga 420 
gaagcagtat tttttactgc tgattagacc ttaacatgga aatgg 9 ca «tct g a «0 



180 
240 
300 
360 



<210> 8 

<211> 449 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> allele 

<222> 420 

<223> 12-783-421 : polymorphic base C or T 
<220> 

<221> misc_binding 

<222> 408.. 432 

<223> 12-783-421. probe 

<220> 

<221> primerjoind 
<222> 401.. 419 
<223> 12-783-421. mis 

<220> 

<221> primerjbind 
<222> 421.. 439 

<223> 12-783-421. mis complement 
<220> 

<221> primerjbind 
<222> 1..19 
<223> 12-783. pu 

<220> 

<221> prime rjbind 

<222> 429. .449 

<223> 12-783. rp complement 

<220> 

<221> allele 
<222> 72 

<223> 12-783-73 : polymorphic base G or C 
<220> 

<221> misc_binding 

<222> 60.. 84 

<223> 12-783-73. probe 

<220> 

<221> primerjbind 

<222> 53.. 71 

<223> 12-783-73. mis 

<220> 

<221> primerjbind 
<222> 73. .91 

<223> 12-783-73. mis complement 
<400> 8 
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atgactgtga 
aggaggcgac 
acacttctgg 
ggttagcata 
gctggcaacc 
ctcattcctc 
aagacagagc 
accgatggca 



gaacgaaggg gtaaactggg 
astgcacaga gacctaagtg 
tggccagaaa ttcagtataa 
aaactcaacc agcaggagca 
gtgagcattg gtcagcgtgg 
taacatggtt ctctcctgtg 
tagactcatt cctgattgaa 
aacacctgct gccaaagat 



acaagatccg gtgattgcag aaggcttctg 
aagtgaggag cccttgcctc tgaggagaca 
ccaggttcag aacaaacaca actgactctg 
gaagtcccca ggcagacgcg agcagagcct 
acattggaca aggggcttcc ttgctcagcc 
ttctgcatgt aggcctttga ggattggaat 
atcacagcca atattctcaa gttccaagay 



60 
120 
180 
240 
300 
360 
420 
449 



<210> 9 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> sequencing oligonucleotide PrimerPO 
<400> 9 

tgtaaaacga cggccagt 

<210> 10 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> sequencing oligonucleotide PrimerRP 
<400> 10 

caggaaacag ctatgacc 
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